Merge pull request #155 from lyskouski/BP-130

[#130] [BP] Benchmarking Tool. Integration tests
lyskouski · Aug 15, 2023 · ce51a1c · ce51a1c
2 parents 16acdb5 + b0fb43e
commit ce51a1c
Show file tree

Hide file tree

Showing 58 changed files with 1,042 additions and 289 deletions.
diff --git a/docs/implementation-flow/ch04-s01-tests.tex b/docs/implementation-flow/ch04-s01-tests.tex
@@ -498,7 +498,7 @@ \subsubsection{Writing Unit Tests with Wrappers (Code Generators)} \label{ut-cod
 \end{lstlisting}
 
 
-\subsubsection{Adding Behavioral Tests (Gherkin)}
+\subsubsection{Adding Behavioral Tests (Gherkin)} \label{t-gherkin}
 
 Improvement cycles are never ends. Previously (\ref{widget-tests}), we've discussed approach to test widgets and applied 
 \q{When ... Given ... Then ...}-notation. That notation is a part of Behavior-Driven Development (BDD) -- the process

diff --git a/docs/implementation-flow/ch04-s05-consequences.tex b/docs/implementation-flow/ch04-s05-consequences.tex
@@ -1,7 +1,7 @@
 \subsection{Assessing of Ignorance} \label{ut-fail}
 
 Initially we've declared importance of having tests and ecosystem for their automation but there were not written any 
-valuable amount of them (10\% coverage, \ref{a-badges}). That was done to show consequences of such a decision -- not 
+valuable amount of them (10\% coverage, \ref{a-badges}). That was done to show a consequence of such a decision -- not 
 to write tests. So, let's measure made mistakes during our Increment (four Iterations, two weeks each):
 
 \begin{lstlisting}[language=bash]
@@ -12,7 +12,7 @@ \subsection{Assessing of Ignorance} \label{ut-fail}
 \end{lstlisting}
 
 \noindent \q{git log}-command retrieves a commit history (\q{\%ad} - to include the commit date, \q{--date=iso}-option 
-converts dates to ISO 8601 [YYY-mm-dd]; \q{\%s} - take subject) with the specified format via \q{--grep} (since we've 
+converts dates to ISO 8601 [YYYY-mm-dd]; \q{\%s} - take subject) with the specified format via \q{--grep} (since we've 
 used \q{[BF]}-prefix in a title for created bug-reports [issues] and used it as a part of the commit message). \q{awk} 
 extracts the date part from each line and delegate sorting to \q{sort}-command by the extracted dates. Finally, 
 \q{uniq -c}-command counts the occurrences of each unique date. And the same operation we'll do for the \q{fix}-keyword.
@@ -54,16 +54,16 @@ \subsection{Assessing of Ignorance} \label{ut-fail}
 
 Test-Driven Development approach is known from 1999 year as Extreme Programming flow, but for unknown to me reason 
 not widely spread. Argumentation that "we do not have a time to write tests" is the same as "we won't use a car since 
-already running to reach our 200km target within a day" (instead of an hour). In Agile transformations it's a mantra 
+already running to reach our 200km target within a day" (instead of a few hours). In Agile transformations it's a mantra 
 that the usage of Scrum (communication framework) will increase development flow 10 times. By looking wider, 
 Agile, DevOps, Lean, and other approaches put an emphasis on the quality throughout the process. It's so since a 
 communication itself has a natural limitation in the achievable performance optimization. Next 10x boost can be reached 
 by growing exceptionally a technical excellence. Observing developers dedicating half a day to test a seemingly 
-"one-hour" change, I find it perplexing that the idea of investing an additional hour in crafting tests is met with 
+"one-hour" change, we may find it perplexing that the idea of investing an additional hour in crafting tests is met with 
 resistance.
 
 As an example, by achieving the technical excellence through a semaphore approach ("red" - write test for the missed 
 part of a code, assert expectations; "yellow" - write code to pass tests; "green" - refactor your code) the 
-stabilization phase, "monthly" regression testing, even QA Department won't be needed. All acceptance criteria for 
-user story, feature, and even epic are transliterated into tests, and controlled by automation. That reinforces the 
-developer's mental model of the code, boosts confidence and increases productivity.
+stabilization phase, "monthly" regression testing, even a separate QA Department won't be needed. All acceptance 
+criteria for user story, feature, and even epic are transliterated into tests, and controlled by automation. That 
+reinforces the developer's mental model of the code, boosts confidence and increases productivity.
diff --git a/docs/implementation-flow/ch05-features.tex b/docs/implementation-flow/ch05-features.tex
@@ -1,6 +1,4 @@
 % Copyright 2023 The terCAD team. All rights reserved.
 % Use of this content is governed by a CC BY-NC-ND 4.0 license that can be found in the LICENSE file.
 
-\markboth{Unleashing Unparalleled Features}{Unleashing Unparalleled Features}
-
 [TBD]
diff --git a/docs/implementation-flow/ch05-s01-tests.tex b/docs/implementation-flow/ch05-s01-tests.tex
@@ -0,0 +1,269 @@
+% Copyright 2023 The terCAD team. All rights reserved.
+% Use of this content is governed by a CC BY-NC-ND 4.0 license that can be found in the LICENSE file.
+
+\subsection{Benchmarking Prototype}
+\markboth{Unleashing Features}{Benchmarking Prototype}
+
+Before adding functionality in the form of muscles to the created prototype skeleton, we need to verify its reliability.
+Restructuring the fundamental concepts of the application in the future would not only pose a considerable challenge 
+but also entail a substantial effort and potential complications.
+
+
+\subsubsection{Providing Integration Tests}
+
+Unit tests (\ref{ut-unit}) and widget tests (\ref{widget-tests}) serve as valuable tools for assessing isolated classes, 
+functions, or widgets. However, not all of the problems can be tackled by them. Integration tests are used to identify 
+systemic flaws (data corruption, concurrency problems, miscommunication between services, etc.) that might not be 
+evident in unit tests by verifying a synergy of individual assets, validating the application as a whole. 
+Integration tests are designed to reflect the real-time performance of an application on an actual device or platform. 
+In conclusion, they provide a vital link in the testing hierarchy by validating a collocation of various components 
+within an application. In such a way integration tests simulate end-to-end user workflows that we've implemented and 
+discussed earlier -- \ref{t-gherkin}.
+
+Integration tests in Flutter can be written by using \q{integration\_test}-package, \q{flutter\_driver}-package would 
+help us to evaluate our tests on real / virtual devices and environments and track the timeline of tests execution 
+(both packages are provided by the SDK):
+
+\begin{lstlisting}[language=yaml]
+## ./pubspec.yaml
+dev_dependencies:
+  integration_test:
+    sdk: flutter
+  flutter_driver: 
+    sdk: flutter
+\end{lstlisting}
+
+\noindent The implementation's deference from a widget test is in a usage of the next code line, that enables tests 
+execution on a physical device or platform:
+\begin{lstlisting}
+IntegrationTestWidgetsFlutterBinding.ensureInitialized();
+\end{lstlisting}
+
+
+\subsubsection{Doing Performance Testing}
+
+Performance testing is a type of software testing designed to evaluate the speed, responsiveness, stability, and 
+overall performance of an application under different conditions. It involves subjecting the application to 
+simulated workloads and stress scenarios to assess how it behaves in terms of speed, scalability, and resource usage. 
+Performance testing ensures that the software can handle the expected load without degradation in performance.
+
+By simulating different levels of user traffic, performance testing helps determine the application's scalability by
+assessing resources utilization (CPU, memory, network bandwidth, and other parameters), and identify performance 
+bottlenecks, such as slow database queries, inefficient code, or network latency, and address these issues before 
+they will impact users.
+
+The detailed information about performance testing can be taken from the International Software Testing Qualifications 
+Board (ISTQB) or the Software Engineering Institute (SEI), while here we'll highlight only their types definition 
+(\cite{Ian15}, \cite{Sag16}, \cite{Sag23}):
+\begin{itemize}
+  \item Load Testing: Evaluates how an application performs under expected load conditions. It helps determine the 
+  application's response time, resource utilization, and overall stability.
+
+  \item Stress Testing: Pushes the application to its limits by subjecting it to extreme conditions, such as excessive 
+  user loads or resource scarcity. It aims to identify the breaking point and understand how the application recovers 
+  from failures.
+
+  \item Endurance Testing: Assesses the application's performance over an extended period to identify issues related to 
+  memory leaks, resource exhaustion, or gradual degradation in performance.
+
+  \item Spike Testing: Simulates sudden spikes in user traffic to assess how the application responds to rapid changes
+  in load. This helps uncover bottlenecks and issues related to sudden surges in demand.
+
+  \item Volume Testing: Focuses on testing the application's performance with large volumes of data, such as a high 
+  number of records in a database. It helps identify scalability and performance issues associated with data volume.
+\end{itemize}
+
+\noindent Back to our process, it would be used the next command to evaluate performance tests:
+
+\begin{lstlisting}[language=bash]
+# Precondition for Web profiling
+chromedriver --port=4444
+# Launch tests
+flutter drive \
+  --driver=test_driver/perf_driver.dart \
+  --target=test/performance/name_of_test.dart \
+  --profile
+\end{lstlisting}
+
+The \q{--profile}-option enables the application compilation in "profile mode" that helps the benchmark results to be
+closer to what will be experienced by end users. By running on a mobile device or emulator it's proposed to use 
+\q{--no-dds}-parameter in addition, that will disable unaccessible Dart Development Service (DDS). The \q{--target} 
+declares the scope of test executions while \q{--driver}-option does track the outcomes. The driver configuration can be
+taken from \href{https://docs.flutter.dev/cookbook/testing/integration/profiling}{https://docs.flutter.dev/cookbook/testing/integration/profiling}:
+
+\begin{lstlisting}
+// ./test_driver/perf_driver.dart
+import 'package:flutter_driver/flutter_driver.dart' as driver;
+import 'package:integration_test/integration_test_driver.dart';
+
+Future<void> main() {
+  return integrationDriver(
+    responseDataCallback: (data) async {
+      if (data != null) {
+        final timeline = driver.Timeline.fromJson(data['timeline']);
+        final summary = driver.TimelineSummary.summarize(timeline);
+        await summary.writeTimelineToFile(
+          'timeline',
+          pretty: true,
+          includeSummary: true,
+          destinationDirectory: './coverage/',
+        );
+      }
+    },
+  );
+}
+\end{lstlisting}
+
+\noindent Since it's a Widget Tests'-based approach (\ref{widget-tests}, \ref{t-gherkin}), we'll accent only on the 
+usage of \q{traceAction}-method to store time-based metrics:
+
+\begin{lstlisting}
+// ./test/performance/load/creation_test.dart
+void main() {
+  final binding = IntegrationTestWidgetsFlutterBinding.ensureInitialized();
+  testWidgets('Cover Starting Page', (WidgetTester tester) async {
+    await binding.traceAction(() async {
+        // ... other steps
+        final amountField = find.byWidgetPredicate((widget) {
+          return widget is TextField && widget.decoration?.hintText == 'Set Balance';
+        });
+        await tester.ensureVisible(amountField);
+        await tester.tap(amountField);
+        // In profiling mode some delay is needed:
+        await tester.pumpAndSettle(const Duration(seconds: 1));
+        // await tester.pump();
+        await tester.enterText(amountField, '1000');
+        await tester.pumpAndSettle();
+        expect(find.text('1000'), findsOneWidget);
+        // ... other steps
+      },
+      reportKey: 'timeline',
+    );
+  });
+}
+\end{lstlisting}
+
+\noindent Generated file \q{timeline.timeline.json} can be traced by \q{chrome://tracing/} in Google Chrome browser 
+(\cref{img:perf-chrome-tracing}):
+
+\img{features/perf-chrome-tracing}{Google Chrome -- performance trace}{img:perf-chrome-tracing}
+
+\noindent The \q{timeline.timeline\_summary.json}-file can be opened in IDE as a native \q{JSON}-file and analyzed 
+manually a performance of the application. For example, the value of \q{average\_frame\_build\_time\_millis}-parameter 
+is recommended to be below 16 milliseconds to ensure that the app runs at 60 frames per second without glitches. Other 
+parameters are widely described on the page --
+\href{https://api.flutter.dev/flutter/flutter\_driver/TimelineSummary/summaryJson.html}{https://api.flutter.dev/flutter/flutter\_driver/TimelineSummary}.
+
+
+\paragraph{Load Testing}
+Check response time and resource utilization for the first run (Initial Setup) by creating account and budget 
+category:
+
+\begin{lstlisting}[language=cucumber]
+@start
+Feature: Verify Initial Flow
+  Scenario: Applying basic configuration through the start pages
+    Given I am firstly opened the app
+    Then I can see "Initial Setup" component
+    When I tap "Save to Storage (Go Next)" button
+    Then I can see "Acknowledge (Go Next)" component
+    When I tap "Acknowledge (Go Next)" button
+    Then I can see "Create new Account" component
+    When I tap on 0 index of "ListSelector" fields
+    And I tap "Bank Account" element
+    And I enter "New Account" to "Enter Account Identifier" text field
+    And I enter "1000" to "Set Balance" text field
+    And I tap "Create new Account" button
+    Then I can see "Create new Budget Category" component
+    When I enter "New Budget" to "Enter Budget Category Name" text field
+    And I enter "1000" to "Set Balance" text field
+    When I tap "Create new Budget Category" button
+    Then I can see "Accounts, total" component
+\end{lstlisting}
+
+\noindent And, what we've identified from our first tests execution is a degraded \q{frame build}-parameter 
+(\cref{tb:frame-build}) that affects our frames per second (FPS) by generating only 37 frames instead of 60:\\
+
+\begin{table}[h!]
+  \begin{tabular}{ |p{6.8cm}||r|r|r|  }
+    \hline
+    \multicolumn{4}{|c|}{Frame Build Time, in milliseconds} \\
+    \hline
+    Type of state & Cold Start & Retrial & With Data\\
+    \hline
+    average          &  26.00 &  24.28 &  29.65 \\
+    90th percentile  &  47.20 &  43.38 &  70.33 \\
+    99th percentile  & 158.31 & 159.41 & 198.03 \\
+    \hline
+  \end{tabular}
+  \caption{Performance Test Results for Feature "Verify Initial Flow"} \label{tb:frame-build}
+\end{table}
+
+\img{features/perf-slow-frame}{Performance Monitor in Visual Studio Code}{img:perf-slow-frame}
+
+\noindent This issue (\cref{img:perf-slow-frame}) pertains to a compilation jank in animations due to shaders 
+calculation (a code snippets executed on a graphics processing unit [GPU] to render a sequence of draw commands). 
+Their pre-compilation strategy mitigates the compilation-related disruptions during subsequent animations, and improves 
+frames per second rendering. To run the app with \q{--cache-sksl} turned on to capture shaders in SkSL:
+
+\begin{lstlisting}[language=bash]
+flutter run --profile --cache-sksl --purge-persistent-cache
+\end{lstlisting}
+
+\noindent Warm-up shaders in Skia Shader Language (SkSL) format for an application build:
+
+\begin{lstlisting}[language=bash]
+# Capture shaders in Skia Shader Language (SkSL) format into a file
+flutter drive --profile --cache-sksl --write-sksl-on-exit sksl.json -t test_driver/warm_up.dart
+# Build app with SkSL warm-up
+flutter build ios --bundle-sksl-path sksl.json
+\end{lstlisting}
+
+\begin{lstlisting}
+// ./test_driver/warm_up.dart
+import 'package:integration_test/integration_test_driver.dart';
+Future<void> main() {
+  return integrationDriver();(*@ \stopnumber @*)
+}
+
+// ./test_driver/warm_up_test.dart
+Future<void> main() async {
+  IntegrationTestWidgetsFlutterBinding.ensureInitialized();
+  SharedPreferencesMixin.pref = await SharedPreferences.getInstance();
+
+  testWidgets('Warm-up', (WidgetTester tester) async {
+    await tester.pumpWidget(MultiProvider(
+      providers: [
+        ChangeNotifierProvider<AppData>(
+          create: (_) => AppData(),
+        ),
+        ChangeNotifierProvider<AppTheme>(
+          create: (_) => AppTheme(ThemeMode.system),
+        ),
+      ],
+      child: const MyApp(),
+    ));
+    await tester.pumpAndSettle(const Duration(seconds: 3));
+  });
+}
+\end{lstlisting}
+
+\noindent Finally, we've taken \q{56 FPS (average)} as an outcome from that tunning.
+
+
+\paragraph{Stress Testing}
+Check initial load (a time before the enabled interaction) with a huge transaction log history (32Mb, 128Mb, 
+512Mb, 2Gb).
+
+
+\paragraph{Endurance Testing}
+Check response time and resource utilization by adding different types of data within a different time 
+periods (15 minutes, an hour, 4 hours, 16 hours).
+
+
+\paragraph{Spike Testing}
+Postponed till the enabled synchronization between different devices.
+
+
+\paragraph{Volume Testing}
+Combine reporting of "Load Testing" with data from "Stress Testing".
diff --git a/docs/implementation-flow/features/perf-chrome-tracing.png b/docs/implementation-flow/features/perf-chrome-tracing.png
diff --git a/docs/implementation-flow/features/perf-slow-frame.png b/docs/implementation-flow/features/perf-slow-frame.png
diff --git a/docs/implementation-flow/index.tex b/docs/implementation-flow/index.tex
@@ -39,6 +39,7 @@
 \crefname{table}{Table}{Tables}
 \usepackage{multicol}
 \usepackage{pgfplots}
+\usepackage{tabularx}
 
 \usepackage{_lib/customization}
 \usepackage{_lib/code-style}
@@ -114,20 +115,21 @@ \section{[WIP] Implementing Core Functionality}
 \include{./ch03-s02-subscription}
 
 \newpage
-\section{[WIP] Defining Quality Gates}
+\section{Defining Quality Gates}
 \input{./ch04-quality-gates}
 \input{./ch04-s01-tests}
-\include{./ch04-s02-automation}
+\input{./ch04-s02-automation}
 \input{./ch04-s03-telemetry}
-\include{./ch04-s04-deployment}
-\include{./ch04-s05-consequences}
+\input{./ch04-s04-deployment}
+\input{./ch04-s05-consequences}
 
 \newpage
-\section{[TBD] Unleashing Features}
+\section{[WIP] Unleashing Features}
 \input{./ch05-features}
+\input{./ch05-s01-tests}
 
 \newpage
-\section{[WIP] Optimizing UI/UX Flow}
+\section{[TBD] Optimizing UI/UX Flow}
 \input{./ch06-s01-autofocus}
 
 \newpage

diff --git a/docs/implementation-flow/references.tex b/docs/implementation-flow/references.tex
@@ -29,7 +29,15 @@
 architecture and assessment", \emph{Packt Publishing}, ISBN 9781788299237, p. 230, May 2017
 
 \bibitem[Suz12]{Suz12} Suzanne Robertson, James Robertson, ``Mastering the Requirements Process: Getting Requirements 
-Right", \emph{Addison-Wesley Professional}, ISBN 978-0321815743, August 2012
+Right", \emph{Addison-Wesley Professional}, ISBN 9780321815743, August 2012
 
+\bibitem[Ian15]{Ian15} Ian Molyneaux, ``The Art of Application Performance Testing: From Strategy to Tools", 
+\emph{O'Reilly Media}, ISBN 9781491900543, p. 275, January 2015
+
+\bibitem[Sag23]{Sag23} Sagar Deshpande, Sagar Tambade, ``Performance Testing Unleashed: A Journey from Novice to Expert", 
+\emph{Independently published}, ISBN 9798398536317, p. 102, June 2023
+
+\bibitem[Sag16]{Sag16} Sagar Deshpande, Ravindra Sadaphule, ``Demystifying Scalability", \emph{CreateSpace Independent 
+Publishing Platform}, ISBN 9781533040510, p. 62, April 2016
 
 \end{thebibliography}
diff --git a/integration_test/README.md b/integration_test/README.md
@@ -0,0 +1,10 @@
+## Tips to evaluate Integration Tests
+
+```
+flutter drive \
+  --driver=test_driver/perf_driver.dart \
+  --target=integration_test/{name}_test.dart \
+  --no-dds
+```
+
+P.S. Launch Chrome Driver `chromedriver --port=4444` for Web profiling