-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathintro.xml
1125 lines (904 loc) · 80.6 KB
/
intro.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Introduction</title>
<introduction>
<p>The subject we are about to study, Linear Algebra, sounds like it might have something to do with lines and doing algebra with them. This is true if you are willing to think metaphorically<ellipsis /> It might be somewhat closer to the truth if we were to say that Linear Algebra is about learning to understand higher dimensions. We'll be surprisingly far along in our study of the topic before we can precisely define what <q>dimension</q> actually means, but we expect that you have some notion already: the Euclidean plane where we studied Geometry is two dimensional, the world we live in is three dimensional, Albert Einstein taught us to view the world not just as space <mdash /> but as space-time <mdash /> a four dimensional concept. We needn't jump off into super advanced physics (or science fiction for that matter) in order to understand higher dimensionality. Dimension, at least informally, just means the number of real numbers it takes to describe something. Locating a point in three-dimensional space requires three numbers <mdash /> usually <m>x</m>, <m>y</m> and <m>z</m>. If we are keeping track of aircraft, knowing where they are in 3-space is certainly necessary, but it might also be a good idea to keep abreast of <em>which way they are going</em>! To truly understand an aircraft's state, one needs to have six numbers: <m>x</m>, <m>y</m> and <m>z</m>, but also the velocity components <m>x'</m>, <m>y'</m> and <m>z'</m>. This make the state of an airplane 6-dimensional. Perhaps this is why air traffic controllers make the big bucks.
</p>
<p>The 6-dimensionality of an aircraft's state may seem somewhat artificial. Aren't we really just dealing with two separate 3-dimensional entities?
</p>
<p>In Economics there is a high-dimensional entity known as the Leontief Input-Output model. In
this model the state of an Economic system is described by a large number of real quantities, one for each sector of the economy. In a 1965 Scientific American article Wassily Leontief (who won a Nobel prize for this work) described his model in terms of a <q>toy example</q> where the economy was divided into 82 sectors. Today one could easily develop a Leontief I/O model where the economy was divided up into a million sectors. Perhaps this is why Economists make even bigger bucks.
</p>
<p>When we do Linear Algebra in two dimensions we are indeed talking about lines. One of the classic problems is to figure out whether two lines intersect and if so, where. This is a situation where our ability to visualize things in two dimensions can lead us straight to the answer. That is certainly not the case in a million (or even in six) dimensions. Fortunately, there are calculational techniques that work (and even work fairly quickly on a good computer) in
just about any number of dimensions you may be interested in.
</p>
<p>There are three different ways of looking at linear algebra problems: systems of linear equations, vector equations, and transformations. These three views actually represent the same underlying structure, just in different ways. There are various situations where one of these three viewpoints is preferable, so it is a good idea to be able to switch back and forth between these representations.
</p>
<p>In Section <xref ref="section-start" /> we will look at the same (really easy) problem from each of these 3 perspectives.
</p>
</introduction>
<section xml:id="section-start">
<title>Getting started</title>
<p>The first problem we're going to look at is fairly trivial. I bet you can solve this in your head:</p>
<blockquote>
<p>I'm thinking of two numbers <m>x</m> and <m>y</m>. Their sum is 42, and their difference is 6. What are they?</p>
</blockquote>
<p>This word problem can be instantly translated into a pair of equations. Later, when we have more sophisticated problems there may be many more unknown quantities and there may be many more equations. Here we are dealing with a system of equations having 2 equations in 2 unknowns.
<md>
<mrow>x+y=42</mrow>
<mrow>x-y=6</mrow>
</md>
</p>
<p>This one is about as easy as a system of two equations in two variables can get. Actually, that's not quite true. The <em>easiest</em> form for a system of two equations in two unknowns is if they basically just are statements of the answer, like:
<md>
<mrow>x=24</mrow>
<mrow>y=18.</mrow>
</md>
Solving a system of equations just means (somehow) transforming it from something like the first form to something like this latter form.
</p>
<p>There are a small number of simple procedures that we can apply to systems without effecting their solutions. We can use these operations to convert almost any system into one that looks like that latter form (each equation just states what the value of some variable is). We'll get around to the full story in section <xref ref="section-sys_eqs" />, but for now, notice that if we add the two original equations together (adding equations means adding left sides and adding right sides separately) we get something that only involves <m>x</m>. And of course, once we know one of the variables it isn't very hard to find the other.
</p>
<p>For this example problem, finding the solution was very easy. There are more difficult systems where finding the solution by hand would be challenging so we are going to want to become familiar with some kind of computer tools for automating these things. In this book, we'll be using Sage, a free, open-source, computer algebra system developed by William Stein. Here is a sample of how Sage can be used to solve a system of equations:
</p>
<sage>
<input>
x, y = var('x, y')
solve([x+y==42, x-y==6], x, y)
</input>
<output>
[[x == 24, y == 18]]
</output>
</sage>
<p>We glossed-over a small but important issue in the above. How do we know that our answer was the only answer? And for that matter, is it necessarily true that there must <em>be</em> an answer to some system of equations? These are what are known as existence and uniqueness questions: Does there exist an answer to our problem? (Existence.) And, if there <em>is</em> an answer, how do we know it is the only answer? (Uniqueness.) There are systems of equations where all of the possible behaviors are exhibited: no solutions, unique solutions and lots of solutions.
</p>
<exercise>
<statement>
<p>Explain why the following system has no solutions at all.
<md>
<mrow>2x - y = 7</mrow>
<mrow>2y - 4x = 8.</mrow>
</md>
</p>
</statement>
<hint>
<p>Put both equations into slope-intercept form (<m>y = mx + b</m>).</p>
</hint>
</exercise>
<p>
That was a linear algebra problem seen from the <q>systems of equations</q> perspective. We still need to look at the <q>vector equations</q> and <q>transformations</q> viewpoints. So next we'll look at a question of the vector flavor. We're going to think about playing chess, not on a board, but on the infinite <m>x</m><mdash /><m>y</m> plane.
</p>
<p>
Consider the piece known as a bishop. If you're not familiar with chess, this is the piece that can move in the diagonal directions. Think of the bishop as having two moves that it can do (but it can do them any number of times). It can do a move we'll refer to as UR; move one unit in the <m>x</m> direction while simultaneously moving one unit in the <m>y</m> direction <mdash /> by doing this multiple times the bishop can travel in the upper right direction. It also has a move that allows it to travel along the other diagonal <mdash /> move one unit in the <m>x</m> direction while simultaneously moving negative one unit in the <m>y</m> direction. We'll call that move LR.
</p>
<p>For those who are familiar with chess, you'll know that bishops are forever trapped on the same color square <mdash /> one of your bishops is always on black and the other always on white. This means that some <q>bishop moving questions</q> won't have solutions <mdash /> for example, a bishop sitting at the origin, <m>(0, 0)</m>, can never move to <m>(0, 5)</m>; those squares have opposite colors! To get around this limitation we're going to let our bishops make fractional moves. For instance if it starts at the origin and makes <m>1/2</m> of the upper-right move then it will arrive at <m>(1/2, 1/2)</m>. Now, getting a little stranger, we're going to also allow our bishops to make negative moves. Maybe we should think of a negative move as <q>undoing</q> a regular move<ellipsis />
</p>
<p>In any case negative moves allow us to move the bishop in the opposite directions along the diagonals. Finally, we may as well give our bishops the freedom to move <em>any amount</em> <mdash /> that is, any real number can be used as a so-called scalar, shrinking or stretching either of the two basic moves. Got it?
We can do things like <m>\pi \cdot UR</m> and <m>\sqrt{2} \cdot LR</m>.
</p>
<p>So, after all that setup, here's the question: If a bishop starts at <m>(0,0)</m>, can it make some number of UR and LR moves and wind up at <m>(42,6)</m>? If so, how many URs and how many LRs?
</p>
<p>The things we've been calling UR and LR are <em>vectors</em>. If you ask someone from the physical sciences to define a vector they'll say <q>it's a thing that has both a magnitude and a direction</q>.
(Which is fine as far as it goes.) Meteorology provides some nice examples. A weather map often shows a lot of basic data about the conditions at various places <mdash /> wind, temperature, barometric pressure and humidity are common. Of these, only the wind is a vector quantity, it needs to be specified with both a magnitude and a direction (<eg /> 15 mph out of the Northeast), the others all just have magnitudes.
</p>
<p>There is a different way of thinking about what a vector is, that is preferable in many circumstances. A vector is the difference between two positions. Let me put this another way: a vector gives you a set of <em>directions</em> to go from one point to another. (I mean <q>directions</q> in the sense of the things someone tells you if you ask <q>How do I get to the Kwik-E-Mart from here?</q>)
</p>
<p>If you are currently at the point <m>(3,4)</m> and you want to move to the point <m>(5,12)</m> you
need to increase your <m>x</m>-coordinate by 2 units and you must increase your <m>y</m>-coordinate by 8 units. We just described the vector <m>\langle 2, 8 \rangle</m>, the numbers <m>2</m> and <m>8</m> are known as the components of the vector. Note that this is different in a not-so-subtle way from the <em>point</em> <m>(2,8)</m>. The point is stationary, the vector is there to describe a change. If you start at the origin and follow the directions specified by the vector <m>\langle 2, 8 \rangle</m> you will of course wind up at the point <m>(2,8)</m>, but if you start at some other point, it's equally obvious that you won't!. Sometimes people will talk about <q>position vectors</q> in this sort of context <mdash /> the position vector <m>\langle x,y \rangle</m> goes from the origin to the point <m>(x,y)</m>. Generally, it is preferable to keep the distinction between points and vectors clear. When you treat a vector as a position vector (i.e. think of it as a point) you are loosing something. Ordinarily a vector is free; it can be slid around from one point to another so long as its components aren't changed.
</p>
<p>Here's how solving the vector variant of our problem might look in Sage:
</p>
<sage>
<input>
x, y, u, v = var('x, y, u, v')
u = vector(QQ, [1, 1])
v = vector(QQ, [1,-1])
lhs = x*u+y*v
rhs = vector(QQ, [42,6])
solve([lhs[0]==rhs[0], lhs[1]==rhs[1]], x, y)
</input>
<output>
[[x == 24, y == 18]]
</output>
</sage>
<p>So, at this point we've looked at a simple linear algebra problem from the systems of equations perspective and from the vector equations perspective. The final perspective we want to illustrate is that of linear transformations.
</p>
<p>Basically, a linear transformation is a function that takes vectors as inputs and spits out vectors as outputs. You're probably familiar with the following sort of diagram for functions.
</p>
<figure>
<image source="images/function_diagram" width="40%" />
</figure>
<p>In Multivariable Calculus you may also encounter functions that are diagramed like so:
</p>
<sidebyside width="40%" margins="auto">
<image source="images/function_diagram2" />
<image source="images/function_diagram3" />
</sidebyside>
<p>The first is a real-valued function of two variables <mdash /> think of it as taking a vector as input and returning a scalar. The second is a vector-valued function of a single real variable. The mapping that gives temperature as a function of position on a metal plate is an example of the first sort. When we represent the position of a particle moving around in space (as a function of time) we are using the second sort.</p>
<p>Linear transformations are functions where there are vectors on both the input and the output side.
</p>
<figure>
<image source="images/function_diagram4" width="70%" />
</figure>
<p>Moreover, linear transformations are <em>linear</em>, which means the components of the output are computed in a very simplistic way from the components of the inputs. The only things that are allowed are adding things up and multiplying by constants.
</p>
<p>So let's give an example of a linear transformation. This will be a function that takes a vector <m>\langle x, y \rangle</m> as input, and returns a vector <m>\langle u, v \rangle</m> as output. We will compute <m>u</m> and <m>v</m> (the components of the output vector) from <m>x</m> and <m>y</m> (the components of the input vector by <q>adding things up and multiplying by constants</q>:
<md>
<mrow>u = x+y</mrow>
<mrow>v = x-y</mrow>
</md>
</p>
<p>By convention, people usually call a linear transformation <m>T</m> and use a notation that looks just like Euler notation for functions (because in fact, that's what it is!)
<md>
<mrow> T( \langle x,y \rangle ) = \langle u, v \rangle . </mrow>
</md>
There are two kinds of problems one can ask: maybe you know the input vector and you'd like to find the output vector, or vice versa. When you've got the input it's very easy to find the output! You just plug in. The more interesting question is when it's vice versa, suppose you know that <m>\langle u, v \rangle = \langle 42, 6 \rangle</m> how can you arrive at the solution <m> \langle x,y \rangle = \langle 24, 18 \rangle </m>? We'll be looking at this kind of thing in more depth in Section <xref ref="section-transformations" />.
</p>
</section>
<section xml:id="section-sys_eqs">
<title>Systems of equations</title>
<p>In this section we'll look much more closely at the <q>systems of equations</q> approach to linear algebra.</p>
<p>First a few words about notation. When there are seventeen variables in a problem it becomes <em>really</em> awkward to use different letters for each variable. When there are a thousand variables it's impossible! We will follow the almost universal convention that the letter <m>x</m> will be used for the variables, with a subscript to identify which one. If we were to translate the problem from Section <xref ref="section-start" /> into this notation it would become
<md>
<mrow>x_1 + x_2 = 42 </mrow>
<mrow>x_1 - x_2 = 6. </mrow>
</md>
</p>
<p>A <em>linear combination</em> of some set of numbers <m>\{ x_1, x_2, \ldots, x_n \}</m> is created by multiplying each of the <m>x</m>'s by constants and then adding everything up. Of course if the constants are <m>1</m> or <m>-1</m> (as in the previous example) we tend to forget that they're there!
</p>
<example>
<p>Consider <m>x_1 + 2x_2 + 3x_3 + 4x_4 + 5x_5</m>. This is a linear combination of the five variables <m>\{x_1, x_2, x_3, x_4, x_5\}</m>. The constants (<m>1, 2, 3, 4,</m> and <m>5</m>) are called the coefficients of the linear combination.
</p>
</example>
<p>An equation is <em>linear</em> if it has the form of a linear combination set equal to some value on the right-hand side -- or if it can be put into that form. For example
<md>
<mrow> x_1 + 2x_2 + 3x_3 + 4x_4 + 5x_5 = 15 </mrow>
</md>
is a linear equation in five variables.
</p>
<p>Also,
<md>
<mrow> x_1 + 3x_3 = x_2 + x_4 </mrow>
</md>
is a linear equation (in four variables) because we can manipulate it into the form
<md>
<mrow> x_1 - x_2 + 3x_3 - x_4 = 0. </mrow>
</md>
</p>
<exercise>
<statement>
<p>The linear equation
<md>
<mrow> x_1 + 2x_2 + 3x_3 + 4x_4 + 5x_5 = 15 </mrow>
</md>
has a solution where all of the variables are set equal to 1. Are there others?
</p>
</statement>
<hint>
<p>Try setting one of the variables to zero. That essentially eliminates that one and gives you a new equation with only four variables. Does the new equation have a solution?
</p>
</hint>
</exercise>
<p>A <em>system of equations</em> is just a collection of linear equations.</p>
<p>The notation for systems of equations gets a bit complicated when we try to write them in general (that is, without particular values given for the various constants involved). There are three sorts of things that need names in such a system: the variables, the coefficients of the variables, and the numbers on the right-hand sides. There is a convention that is fairly universal for the names and numbering of these elements. The variables are <m>x</m>'s with subscripts, the right-hand sides are <m>b</m>'s with subscripts, and the coefficients are <m>a</m>'s with <em>two</em> subscripts (we need to indicate the equation that a given coefficient is in and also, which variable it is multiplying.
</p>
<p>For example, here is how we would write the general form of a system of three equations in four unknowns:
<me>
\begin{alignedat}{5}
\aij{1}{1} x_1 \amp {}+{} \amp \aij{1}{2} x_2 \amp {}+{} \amp \aij{1}{3} x_3 \amp {}+{} \amp \aij{1}{4} x_4 \amp {}={} \amp b_1 \\
\aij{2}{1} x_1 \amp {}+{} \amp \aij{2}{2} x_2 \amp {}+{} \amp \aij{2}{3} x_3 \amp {}+{} \amp \aij{2}{4} x_4 \amp {}={} \amp b_2 \\
\aij{3}{1} x_1 \amp {}+{} \amp \aij{3}{2} x_2 \amp {}+{} \amp \aij{3}{3} x_3 \amp {}+{} \amp \aij{3}{4} x_4 \amp {}={} \amp b_3 \\
\end{alignedat}
</me>
Notice that the indices on the <m>x</m>'s run from 1 to 4, the indices on the <m>b</m>'s run from 1 to 3, and that there are a total of 12 coefficients.
</p>
<example xml:id="ex-invest">
<title>Investments</title>
<statement>
<p>Suppose you have <dollar /><m>10,000</m> that you want to invest in the stock market. After some research you've found three companies that you think will be good investments. SolarCity Corp (SCTY) is trading at about <dollar /><m>20</m> per share. SunPower - Solar energy company (SPWR) is trading at about <dollar /><m>9</m> per share. First Trust Global Wind Energy (FAN) is at about <dollar /><m>12</m> per share. One equation you can immediately write down is
<md>
<mrow>20 x_1 + 9 x_2 + 12 x_3 = 10000,</mrow>
</md>
where <m>x_1</m> is the number of shares of SCTY we will buy, <m>x_2</m> is the number of shares of SPWR, and <m>x_3</m> is the number of shares of FAN.
</p>
<p>If we said nothing further, we'd have just this one equation and there are many possible sets of values for the variables that satisfy it. Notice that there are two broad categories of companies represented in our stock picks -- solar energy and wind power. Perhaps we'd be wise to split our investment between them based on some rational theory, for the sake of argument let's say that we've been advised to use a 60/40 split between solar and wind. What was previously a single equation is now two:
<md>
<mrow>20 x_1 + 9 x_2 = 60000,</mrow>
<mrow>12 x_3 = 40000.</mrow>
</md>
</p>
<p>Notice that the second equation uniquely determines the value of <m>x_3</m> but that the other variables still have a bit of freedom. (For instance, notice that we could set either <m>x_1</m> or <m>x_2</m> to <m>0</m>, and the other variable's value would then be uniquely determined. Or, of course we could have some mixture where our <dollar /><m>6,000</m> is split up between the two companies. As it happens, these two companies are competitors and there is some probability that one will succeed and the other will fail. A wise investor tries to guess what that probability is and <q>hedge</q> their bets on the market. For the sake of argument let's say we think SCTY is three times more likely to come out the winner in this competition. You might be inclined to just buy only the SCTY stock, but that's not what a hedging strategy would indicate -- you should mix your investments in a proportion that reflects the probabilities involved. As an equation in the <m>x</m>'s we have
<md>
<mrow>20 x_1 = 3 \cdot 9 x_2.</mrow>
</md>
</p>
<p>At this point we've obtained a system of 3 equations in 3 variables which, after manipulating the last one a little bit, looks like the following.
<md>
<mrow>20 x_1 + 9 x_2 = 60000</mrow>
<mrow>12 x_3 = 40000</mrow>
<mrow>20 x_1 - 27 x_2 = 0</mrow>
</md>
</p>
<p>It is usually a good idea to format your systems so that the variables in each equation line up in columns.
<me>
\begin{alignedat}{4}
20 x_1 \amp {}+{} \amp 9 x_2 \amp \amp \amp {}={} \amp 6000 \\
\amp \amp \amp \amp 12 x_3 \amp {}={} \amp 4000 \\
20 x_1 \amp {}-{} \amp 27 x_2 \amp \amp \amp {}={} \amp 0
\end{alignedat}
</me>
</p>
</statement>
<solution>
<p>Now, let's go ahead and figure out what the values of the variables should be. In other words, how many shares of each stock should we purchase?
</p>
<p>First, look at that middle equation. It isn't very complicated, indeed, it basically <em>tells</em> us the value of <m>x_3</m> <mdash /> we just need to divide both sides by 12 to get that <m>x_3 = 333.\overline{3}</m>. Unfortunately, we can't buy fractions of a share of stock so we'll round to <m>333</m>.
</p>
<p>We're somewhat lucky in that the variable <m>x_3</m> doesn't appear in the other equations, but <em>even if it did</em>, we could now substitute the value we just determined for it. Furthermore, at this point, we have no more use for that middle equation; we've used it up in finding the value of <m>x_3</m>. So now we've reduced our problem to a simpler system <mdash /> one that consists of just two equations in the remaining two unknowns.
<me>
\begin{alignedat}{3}
20 x_1 \amp {}+{} \amp 9 x_2 \amp {}={} \amp 6000 \\
20 x_1 \amp {}-{} \amp 27 x_2 \amp {}={} \amp 0
\end{alignedat}
</me>
</p>
<p>If we subtract the first equation from the second we get
<me>
\begin{alignedat}{3}
\amp {}-{} \amp 36 x_2 \amp {}={} \amp -6000
\end{alignedat}
</me>
and this tells us (just divide both sides by <m>-36</m>) the value of <m>x_2</m>.
</p>
<p>What we've determined so far is that <m>x_3 = 333</m> and <m>x_2 = 167</m>. By substituting those values into the very first equation we wrote down we'll be able to find the value of <m>x_1</m>.
</p>
<p>After making those substitutions we get an equation that only has one variable:
<md>
<mrow>20 x_1 + 9 \cdot 167 + 12 \cdot 333 = 10000.</mrow>
</md>
It's child's play to find the solution is <m>x_3=225</m>.
So in the end we should put in an order for 225 share of SCTY, 167 shares of SPWR and 333 shares of FAN. Notice that because of rounding we've come up one dollar short of our intended investment.
</p>
</solution>
</example>
<p>A bit more formalism is appropriate now. We'll start with some definitions.
</p>
<definition>
<title>linear system</title>
<index><main>linear system</main>
</index>
<statement><p>A <term>linear system</term>, also known as a <term>system of linear equations</term> is a collection of <m>m</m> equations in <m>n</m> unknowns of the form
<me>
\begin{alignedat}{5}
a_{11} x_1 \amp {}+{} \amp a_{12} x_2 \amp {}+{} \amp \amp {}\cdots {} \amp \amp {}+{} a_{1n} x_n \amp {}={} \amp b_1 \\
a_{21} x_1 \amp {}+{} \amp a_{22} x_2 \amp {}+{} \amp \amp {}\cdots {} \amp \amp {}+{} a_{2n} x_n \amp {}={} \amp b_2 \\
\amp \amp \amp \amp \amp \vdots \amp \amp \amp \amp \\
a_{m1} x_1 \amp {}+{} \amp a_{m2} x_2 \amp {}+{} \amp \amp {}\cdots {} \amp \amp {}+{} a_{mn} x_n \amp {}={} \amp b_m \\
\end{alignedat}
</me>
</p>
<p>Note that the doubly-indexed quantities (<m>a_{ij}</m>) as well as the singly-indexed quantities (<m>b_i</m>) are real numbers and that the <m>m</m> variables are indicated by <m>x</m>'s (with subscripts).
</p>
</statement>
</definition>
<remark><p>The use of variables with multiple indices in the above definition bears comment. First of all, note that we are trying to deal with the general situation where there is an unknown number of equations (<m>m</m>) in an unknown number of variables (<m>n</m>). Let's consider the <m>b</m>'s first <mdash /> these are the constants that appear on the right-hand sides of the equations, so there are <m>m</m> of them. The situation for the <m>a</m>'s is more complicated. The <m>a</m>'s are the coefficients, they are constant numbers that the variables are multiplied by, and there are two indices on each of them. The first index tells us which equation we are in. The second index matches with the subscript on the variable. For example <m>\aij{14}{23}</m> would be the coefficient of <m>x_{23}</m> in the <m>14</m>th equation in a system.
</p>
</remark>
<p>What does it mean to say we have found an <q>answer</q> to a system of equations? Essentially, it is this: we have found a set of values for the variables that <q>work</q> in all of the equations. Sometimes people say that this set of values <q>satisfies</q> the equations. To be completely clear, what is meant is that if one substitutes these values for the variables in the equations of the system, all of them (the equations) will be true. It is convenient to regard such a set of values as a vector. For example the solution we obtained in Example <xref ref="ex-invest" /> would be regarded as the vector <m>\langle 225, 167, 333 \rangle</m>.
</p>
<definition>
<title>solution sets</title>
<index><main>solution sets</main>
</index>
<statement>
<p>Given a system of <m>m</m> linear equations in <m>n</m> unknowns, the <term>solution set</term> of the system is the set of all vectors of length <m>n</m> that satisfy all <m>m</m> of the equations in the system.
</p>
</statement>
</definition>
<definition>
<title>equivalent systems</title>
<index><main>equivalence of linear systems</main>
</index>
<statement>
<p>Two linear systems are called <term>equivalent</term> if and only if they have identical solution sets.
</p>
</statement>
</definition>
<remark>
<p>The equivalence of linear systems is an example of what is known as an equivalence relation. Equivalence relations are used in theoretical mathematics when we are trying to capture the notion that two things <mdash /> while not <em>actually</em> equal <mdash /> are similar enough that we can treat them as being sort of a junior version of equal<ellipsis />
</p>
<p>For a relationship to earn the title <q>equivalence relation</q> it must have a short list of properties. These properties are certainly true of the ordinary equals sign:
</p>
<p>
<dl>
<li><title>reflexivity</title><p>A relation is reflexive iff all elements are related to themselves.
</p>
</li>
<li><title>symmetry</title><p>A relation is symmetric iff whenever <m>x</m> and <m>y</m> are a pair of elements that are related, then <m>y</m> and <m>x</m> are also a pair that are related. (I.e. the order can always be reversed.)
</p>
</li>
<li><title>transitivity</title><p>Perhaps you've heard the phrase <q>Two things that are equal to a third must be equal to each other.</q> That's the essence of transitivity.
</p>
</li>
</dl>
</p>
<p>
There really is much more that we should say about equivalence relations in general and the consequences that ensue when we can show that some relation is an equivalence relation. We refer the interested reader to chapter 6 in <url href="https://osj1961.github.io/giam/">GIAM</url>. In the remainder of this book we are going to <em>see</em> how very useful the notion of equivalence of linear systems can be. Hopefully this will give you some indication of how useful equivalence relations in general can be!
</p>
<p>One final word about equivalence relations (in general) and the equivalence of linear systems (in particular): It is customary, when introducing this notion, to ask students to come up with a proof that shows that some given relation (in this case, equivalence of linear systems) is indeed an equivalence relation. Such proofs are actually relatively straightforward, but <em>relax</em>, we're going to let you off the hook this time! Showing that equivalence of linear systems is an equivalence relation is actually too easy. What one needs to do is show that it has each of the three properties: reflexivity, symmetry and transitivity. Each of those is an almost immediate consequence of the way this equivalence is defined. We define two systems to be equivalent if and only if they have the same solution set. In other words, equivalence is <em>defined</em> in terms of set equality. Set equality is definitely an equivalence relation, so it has the three properties. Finally, the arguments that show that equivalence of linear systems has the three properties all have the same form: in order to show that the equivalence of linear systems has a property we use the fact that set equality has that property. This is called inheritance.
</p>
</remark>
<p>The general idea is this: there are lots and lots of different linear systems that are equivalent. They all have the same solution set. Some of these systems are in a nice form that allows us to see what the solution set is. Others are not. We need to transform the latter into the former!
</p>
<p>More specifically, there are three operations that can be applied to linear systems which <em>do not have any effect on solution sets</em>. We can apply these three operations in any way we like! We'll just be transforming our linear system into a slightly different one that is equivalent to the original. Finally, you'll see that it is pretty easy to strategize a bit and transform difficult linear systems into the nice sort (where the solution set is very evident) using these three operations.
</p>
<p>The three operations go by many names; we'll refer to them as Reordering, Scaling and Combining. In the next few paragraphs we'll discuss each of them in turn and explain why they don't have an effect on the solution set of a system.
</p>
<p><em>Reordering</em> means what it sounds like. The solution set is determined by checking whether a given solution vector satisfies <em>all</em> of the equations. It is pretty clear that the order that the equations are listed in is of little importance. In many treatments of linear algebra an operation called <q>swapping</q> is used instead <mdash /> swapping two equations is a special (particularly simple) instance of reordering and any more general reordering can be accomplished by a succession of swaps.
</p>
<exercise><title>permutations and swaps</title>
<statement><p>We have placed the letters A through F in sequence below <mdash /> however they are not in the usual (alphabetic) order. Determine a sequence of swaps that will reorder them so that they <em>are</em> in alphabetic order.
<me> DCABFE </me>
</p>
</statement>
<hint><p>There are many ways to proceed, but putting A then B then C <foreign>et cetera</foreign> where they belong using swaps is one possibility. What swap puts A in the first position?
</p>
</hint>
<solution><p>
<md>
<mrow> DCABFE \quad \mbox{(given) \phantom{swap D}} </mrow>
<mrow> ACDBFE \quad \mbox{swap D and A}</mrow>
<mrow> ABDCFE \quad \mbox{swap C and B}</mrow>
<mrow> ABCDFE \quad \mbox{swap D and C}</mrow>
<mrow> ABCDEF \quad \mbox{swap F and E}</mrow>
</md>
</p>
</solution>
</exercise>
<p>Scaling is another operation where it is fairly obvious that there will be no effect on solution sets. Scaling involves multiplying both sides of an equation by some non-zero constant. Very often that non-zero constant will be the reciprocal of the coefficient of one of the variables; scaling by such a constant is useful in solving for that variable. Perhaps it is clear that multiplying both sides of an equation by the same thing will have no impact on what values of the variables satisfy the equation<ellipsis /> But why does the constant need to be non-zero? Multiplying both sides of <em>any</em> equation by <m>0</m> will produce a new equation that looks like <m>0 = 0</m> which is certainly true! In fact, of course, that's what the problem is; if the equation was previously false for some vector of variable values (thus it served to exclude that vector from the solution set) it will now be true. So vectors of variable values that previously were not in the solution set will now be in it <mdash /> that's the sort of thing we are trying to avoid!
</p>
<p>Combining (a.k.a replacement) is the most difficult of the three operations and as you might guess, it is also the most useful. Combining consists of adding a multiple of some other equation to a given one. Another way to think of this is that we replace some equation
by <em>itself</em> plus a multiple of some other equation. This is probably why some people call this operation Replacement.
</p>
<p>When we added the equations <m>x+y=42</m> and <m>x-y=6</m>, obtaining the new equation <m>2x=48</m> back in Section <xref ref="section-start" /> we were really doing a <q>Combining</q> operation. By the way, when we divided both sides of that new equation by <m>2</m> we were <q>Scaling.</q>
</p>
<p>We'll close this section by giving an example <mdash /> using the three operations to find the solution of a small linear system.
</p>
<example><title>A small linear system</title>
<statement>
<p>
There is a unique solution to the following system of 3 equations in 3 unknowns. What is it?
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
2 x_1 \amp {}-{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 12 \\
x_1 \amp {}+{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp 17
\end{alignedat}
</me>
</p>
</statement>
<solution>
<p>The first thing we'll do is a combining operation. We'll subtract twice the first equation from the second. It will be convenient to develop a shorthand for expressing these operations. This one could be written as <m>E_2 = E_2 - 2E_1</m>.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
\amp {}-{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -30 \\
x_1 \amp {}+{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp 17
\end{alignedat}
</me>
</p>
<p>Next we'll do a similar combining operation to eliminate <m>x_1</m> from the 3rd equation.
This one would be expressed as <m>E_3 = E_3 - E_1</m>.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
\amp {}-{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -30 \\
\amp {}+{} \amp 2 x_2 \amp {}-{} \amp 2 x_3 \amp {}={} \amp -4
\end{alignedat}
</me>
</p>
<p>You should take note that we have done something mildly clever in eliminating the occurrences of <m>x_1</m> in the latter two equations. Now we can use them in further combination operations without fear that they will effect terms involving <m>x_1</m>.
</p>
<p>For our next operation let's scale the last equation by <m>1/2</m>; this isn't strictly necessary but it makes things <em>look</em> a little simpler and since every coefficient in the 3rd equation is even we won't end up dealing with fractions <ellipsis /> <m>E_3 = \frac{1}{2} E_3</m>.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
\amp {}-{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -30 \\
\amp \amp x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -2
\end{alignedat}
</me>
</p>
<p>We've just cleaned up the 3rd equation so that the first non-zero term in it (the one involving <m>x_2</m>) has a coefficient of 1. This makes equation 3 very useful as a tool for eliminating the variable <m>x_2</m> from other equations, so next we'll do a reordering operation to move it a bit closer to the top of the heap.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
\amp \amp x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -2 \\
\amp {}-{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -30 \\
\end{alignedat}
</me>
</p>
<p>Next, let's use what is now equation 2 to eliminate <m>x_2</m> from (the new) equation 3:
<m>E_3 = E_3 + 3 E_2</m>.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
\amp \amp x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -2 \\
\amp \amp \amp {}-{} \amp 4 x_3 \amp {}={} \amp -36 \\
\end{alignedat}
</me>
</p>
<p>Finally, although (again) this isn't strictly necessary, let's scale the 3rd equation so that the coefficient of <m>x_3</m> is 1<ellipsis /> <m>E_3 = \frac{-1}{4} E_3</m>.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\
\amp \amp x_2 \amp {}-{} \amp x_3 \amp {}={} \amp -2 \\
\amp \amp \amp \amp x_3 \amp {}={} \amp 9 \\
\end{alignedat}
</me>
</p>
<p>Wait! Should the last sentence really have started with the word <q>Finally</q>? It seems like the system is still pretty complicated. We certainly haven't achieved the simplest possible sort of linear system, but we <em>have</em> turned the original system into a type that is known as <q>triangular</q>. Do you see why? This kind of system is very easy to solve by a process known as back-substitution. The 3rd equation tells you the exact value of the third variable (<m>x_3 = 9</m>), you can then substitute that value into the second equation to obtain <m>x_2 - 9 = -2</m>. So now we can easily see that <m>x_2=7</m>. Hmmm. Now we've got known values for <m>x_2</m> and <m>x_3</m> which we can substitute into the 1st equation to get
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp 7 \amp {}+{} \amp 9 \amp {}={} \amp 21 \\
\end{alignedat}
</me>
</p>
<p>Okay. That's easy, <m>x_1=5</m>.
</p>
</solution>
</example>
</section>
<section xml:id="section-vectors">
<title>Vector equations</title>
<p>We have previously seen the idea of a linear combination of numbers. In this section we will look at forming linear combinations of vectors. The typical problem of the vector equations sort is: can we find the coefficients so that a linear combination of some set of vectors (with those coefficients) is equal to a given vector?
</p>
<p>Recall that when we formed linear combinations of numbers we were allowed to <q>multiply by constants and add things up.</q> So if we are planning to do the same thing with vectors we need to understand what it means to multiply a vector by a constant
and what it means to add vectors.
</p>
<p>We use the term <em>scalar</em> to refer to real numbers, <em>especially</em> when referring to the numbers that we multiply vectors by. Calling them <q>constants</q> is probably not the best plan; both a scalar and a vector can be <em>constant</em> <mdash /> that just means they aren't changing. It's usually more important to distinguish the vectors from the scalars <mdash /> which things have multiple components and which don't? When we think of vectors as <q>those things that have both a direction and a magnitude,</q> the effect of multiplying by a scalar is to leave the direction unchanged, but change the magnitude by scaling it as the scalar indicates. If the scalar is less than 1, the magnitude of the vector will be reduced; if the scalar is greater than 1 it will be increased. Of course, if the scalar is negative the direction <em>will</em> be effected, but in a rather simplistic way: the vector ends up facing the opposite direction.
</p>
<p>When we have an actual vector and a scalar we'd like to multiply it by, the operation we perform is almost the only thing it could be! Just multiply each of the components of the vector by the scalar.
</p>
<definition>
<title>scalar-vector product</title>
<statement>
<p>If <m>\vec{v}</m> is a vector having <m>m</m> components, <m>\vec{v} = \langle v_1, v_2, \ldots , v_m \rangle</m> and <m>s</m> is a scalar, then the <term>scalar multiplication</term> of <m>\vec{v}</m> by <m>s</m> is defined by
<me>s\vec{v} = s \langle v_1, v_2, \ldots , v_m \rangle = \langle sv_1, sv_2, \ldots , sv_m \rangle </me>
</p>
</statement>
</definition>
<remark><p>The scalar-vector product looks rather like a funny version of the distributive law!
</p>
</remark>
<p>The addition of vectors is best thought of in terms of <q>directions</q>. Suppose the directions to got from my house to the Kwik-E-Mart are: <q>go 3 blocks north and 1 block east</q> (call that vector <m>\vec{v}</m>, we might write it's component form as <m>\vec{v} = \langle 1, 3 \rangle</m>). Suppose in addition that the directions to go from the Kwik-E-Mart to Moe's Tavern are <q>go 1 block north and 2 blocks west</q>
(let's call this <m>\vec{w} = \langle -2, 1\rangle</m>). The meaning of the vector sum is the vector that describes the change that would be effected if we follow one set of directions followed by the other <mdash /> except we don't have to be slavish about it <mdash /> we don't literally follow the first set of directions and then do the second. The sum is the set of directions that take us directly to Moe's without making a Kwik-E-Mart pit stop.
</p>
<p>When we actually compute vector sums using the component forms of the vectors involved the computation is probably exactly what you would expect: just add up the corresponding components.
</p>
<definition>
<title>vector addition</title>
<statement>
<p>If <m>\vec{v}</m> and <m>\vec{w}</m> are both vectors having <m>m</m> components,
<md>
<mrow>\vec{v} = \langle v_1, v_2, \ldots , v_m \rangle</mrow>
<intertext> and </intertext>
<mrow>\vec{w} = \langle w_1, w_2, \ldots , w_m \rangle</mrow>
</md>
then their <term>vector sum</term> is defined by
<md>
<mrow>\vec{v} + \vec{w} = \langle v_1+w_1, v_2+w_2, \ldots , v_m+w_m \rangle.</mrow>
</md>
</p>
</statement>
</definition>
<remark><p>The addition of vectors is also known as <em>componentwise</em> addition. It's worth pointing out that if two vectors have different numbers of components, adding them together generally doesn't make sense.
</p>
</remark>
<p>One last definition will be needed to work with vector equations. What does it mean for two vectors to be equal to one another? The answer is probably entirely obvious, but we'll include a formal definition here for completeness.
</p>
<definition>
<title>vector equality</title>
<statement>
<p>If <m>\vec{v}</m> and <m>\vec{w}</m> are two vectors of length <m>m</m> having components
<md>
<mrow>\vec{v} = \langle v_1, v_2, \ldots , v_m \rangle</mrow>
<intertext> and </intertext>
<mrow>\vec{w} = \langle w_1, w_2, \ldots , w_m \rangle</mrow>
</md>
then we say <m>\vec{v}</m> and <m>\vec{w}</m> are <term>equal</term> and write <m>\vec{v} = \vec{w}</m> if and only if for every <m>i</m>, <m>1\leq i \leq m</m>, <m>v_i = w_i</m>.
</p>
</statement>
</definition>
<example><title>a small vector problem</title>
<statement>
<p>Consider the following set of vectors: <m>\langle 1, 1, 0 \rangle</m>,
<m>\langle 1, 1, 1 \rangle</m> and <m>\langle 0, 0, 1 \rangle</m>. Is it possible to find scalars <m>x_1</m>, <m>x_2</m> and <m>x_3</m> so that
<md><mrow>x_1 \langle 1, 1, 0 \rangle + x_2 \langle 1, 1, 1 \rangle + x_3 \langle 0, 0, 1 \rangle = \langle 2, 3, 4 \rangle </mrow>
</md>
</p>
</statement>
<solution>
<p>Let's modify the given problem by using the definitions of (first) scalar multiplication (and then) vector addition:
<md><mrow>\langle x_1, x_1, 0 \rangle + \langle x_2, x_2, x_2 \rangle + \langle 0, 0, x_3 \rangle = \langle 2, 3, 4 \rangle .</mrow>
<intertext> and then </intertext>
<mrow>\langle x_1 + x_2, x_1 + x_2, x_2 + x_3 \rangle = \langle 2, 3, 4 \rangle .</mrow>
</md>
</p>
<p>
Now (surprise!) that final form <mdash /> after we use the definition of vector equality <mdash /> becomes a system of three equations in three unknowns.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}={} \amp 2 \\
x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}={} \amp 3 \\
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 4
\end{alignedat}
</me>
</p>
<p>This system is different from the other systems we've seen so far. It doesn't have a solution. Its statement includes an impossibility; if <m>x_1</m> and <m>x_2</m> have a sum of <m>2</m> (from the first equation) how can they also have a sum of <m>3</m> (which is what the second equation asserts). So there simply <em>aren't</em> three numbers which can be used as the coefficients!
</p>
</solution>
</example>
<p>Let's make a tiny change to the previous problem. Sometimes small changes have large effects! We'll change the second component in the vector on the right-hand side to a <m>2</m>.
</p>
<example><title>a slightly tweaked vector problem</title>
<statement>
<p>Consider the following set of vectors: <m>\langle 1, 1, 0 \rangle</m>,
<m>\langle 1, 1, 1 \rangle</m> and <m>\langle 0, 0, 1 \rangle</m>. Is it possible to find scalars <m>x_1</m>, <m>x_2</m> and <m>x_3</m> so that
<md><mrow>x_1 \langle 1, 1, 0 \rangle + x_2 \langle 1, 1, 1 \rangle + x_3 \langle 0, 0, 1 \rangle = \langle 2, 2, 4 \rangle </mrow>
</md>
</p>
</statement>
<solution>
<p>Notice that since the left-hand side vectors are all the same as before we can reuse our previous work. The final form of the vector equation is
<md>
<mrow>\langle x_1 + x_2, x_1 + x_2, x_2 + x_3 \rangle = \langle 2, 2, 4 \rangle .</mrow>
</md>
</p>
<p>Now, as a system of equations, we have
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}={} \amp 2 \\
x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}={} \amp 2 \\
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 4
\end{alignedat}
</me>
and the first two equations are identical <mdash /> they no longer cause a contradiction. This system not only has a solution, it has <em>lots</em> of them!
</p>
<p>When one equation is an exact duplicate of the other, is there really any reason to retain both copies in the system? Remember that we are mostly concerned with solution sets to linear systems. Either of the copies of this equation will have the same effect on solution sets. For a given vector, they will both either say <q>Sure! it works for me, put it in the solution set</q> or <q>No way, that vector is <em>not</em> okay with me! It makes me false.</q> So, from the perspective of solution sets, this system is really just a system of two equations in three unknowns.
<me>
\begin{alignedat}{4}
x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}={} \amp 2 \\
x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 4
\end{alignedat}
</me>
</p>
<p>By subtracting the first equation from the second we get a unique value for <m>x_3</m> (<m>x_3=2</m>). But any pair of numbers that add up to <m>2</m> will work for <m>x_1</m> and <m>x_2</m>. Not only is the solution not unique, the solution set for this system is infinite!
</p>
<p>We can express the solution set of this system using set-builder notation and a parameter.
<md><mrow> \left\{ \langle 2-t, t, 2 \rangle \suchthat t \in \Reals \right\} </mrow></md>
</p>
<p>Notice how the parameter <m>t</m> allows the values of <m>x_1</m> and <m>x_2</m> to range over all possibilities that add up to <m>2</m>? We essentially let <m>x_2</m> have any value whatsoever (<m>t</m> can be any real number) and then we choose <m>x_1</m> in such a way that the sum is <m>2</m>. In a situation like this, <m>x_2</m> is known as a <em>free variable</em>.
</p>
</solution>
</example>
</section>
<section xml:id="section-transformations">
<title>Transformations</title>
<p>A transformation is a function whose inputs and outputs are vectors. In order to discuss concepts like the range and domain of a transformation we'll need some terminology for <em>sets</em> of vectors. When we are considering the set of all possible vectors of some type it is known as a <em>vector space</em>. At first, we are going to be looking at the most basic and fundamental sorts of vector spaces <mdash /> where the vectors are ordered tuples of real numbers <mdash /> but be advised that later we will see that there are many other sorts of vectors!
</p>
<definition>
<title>Real Euclidean spaces</title>
<statement><p>Given a positive integer <m>n</m> we define the <term>real Euclidean space of dimension <m>n</m></term> (denoted <m>\Reals^n</m>) to
be the set of all ordered <m>n</m>-tuples of real numbers.
<md><mrow>\Reals^n \; = \; \{ \langle v_1, v_2, \ldots , v_n \rangle \, \suchthat \, \forall i, 1 \leq i \leq n, \, v_i \in \Reals \} </mrow></md>
</p>
</statement>
</definition>
<p>Recall that the <term>domain</term> of a function is the set from which the inputs come. The set where the outputs may appear is known as the <term>codomain</term> of the function. The codomain must be contrasted with the <term>range</term> which is the set of outputs that actually <em>do</em> occur. We are going to be presuming a certain familiarity with the basic terminology used with functions. You can skip over the following list of (informal) definitions if you are already familiar.
</p>
<p>
<dl>
<li><title>domain</title> <p>The set of all inputs for a function. The domain is sometimes specified while defining the function, but if it isn't, the convention is to use the biggest possible set for the domain.</p></li>
<li><title>codomain</title> <p>The set where the outputs of a function lie.</p></li>
<li><title>range</title> <p>The set of outputs that actually occur. (The range is generally a subset of the codomain.) </p></li>
<li><title>image</title> <p>If an element, <m>x</m>, of the domain is given, we refer to <m>f(x)</m> as the <em>image</em> of <m>x</m>.</p></li>
<li><title>pre-image</title> <p>If we have some <m>y</m> (an output) in mind, any <m>x</m> (an input) such that <m>f(x) = y</m> is called a <em>pre-image</em> of <m>y</m>.</p></li>
</dl>
</p>
<p>There is a bit of an asymmetry in the way we speak of the various sets that are related to a function. On the output side we have the codomain and the range. On the input side we have only the domain. There is no agreed upon name for a set that contains the domain, we simply insist that the function must be defined for every element of the domain (which basically sidesteps the issue). For the ordinary functions that one sees in calculus, the codomain is the real numbers; the range and domain are generally subsets of the real numbers. And so, the situation isn't terribly complex. When we are dealing with transformations things are harder. The domain and codomain of a transformation are generally real Euclidean spaces <mdash /> potentially of different dimensions <mdash /> so we will usually want to spell out what sorts of vectors are expected as inputs, what sorts of vectors will we see as outputs and only then do we get around to the heart of the matter: how do we compute the output from the input? We'll introduce the notation for a transformation via an example and then treat the general case.
</p>
<example>
<title>an example transformation</title>
<statement><p>Let's look at a transformation that takes vectors of length 6 as inputs, and outputs vectors of length 3. We'll refer to the input vector as <m>\vec{x}</m> and, as usual, its components will be <m>x</m>'s with subscripts:
<m>\vec{x} = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle</m>. Similarly, the output will be <m>\vec{y} = \langle y_1, y_2, y_3 \rangle</m>. This is only an example so we'll just make up the rules that determine those output components from the input components, the point here is simply to demonstrate how one should write such a thing <mdash /> which is as follows:
<md>
<mrow>T:\Reals^6 \longrightarrow \Reals^3</mrow>
<mrow>T(\langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle) \quad = \quad \langle x_1, x_3, x_5 \rangle .</mrow>
</md>
</p>
<p>So this transformation just picks out the odd-numbered components of <m>\vec{x}</m> and puts them in <m>\vec{y}</m>.
</p>
</statement>
</example>
<p>The most important transformations for us in this context are the <em>linear</em> ones. In a linear transformation, the components of the output vector are computed from the components of the input vector by <q>multiplying by constants and adding everything up.</q> Because of the simplistic way that the outputs are computed there is really nothing that can go wrong! With ordinary functions from <m>\Reals</m> to <m>\Reals</m> we usually look at the rule for computing the output and recognize certain values that must be eliminated from the domain <mdash /> typically where one sees <q>division by zero</q> or <q>square root of a negative</q> errors. No such problem can arise with linear transformations, the domain will always be a real Euclidean space of some dimension. Similarly, the codomain will be a real Euclidean space; one whose dimension is simply the number of components in the output vectors. The dimensions of the domain and codomain are easy to think about <mdash /> how many components do the input and output vectors have? The range of a linear transformation is slightly more complicated. The output vectors that actually occur will certainly be vectors having the number of components as specified by the codomain, but do all such vectors necessarily have to appear as outputs? In general, no.
</p>
<p>The notation for a linear transformation first spells out the domain and codomain and then gives the rule(s) for computing the output. Thus the domain and codomain are known in advance; we need to do a little extra work to figure out the range.</p>
<p>Before proceeding further we'll give some formal definitions.</p>
<definition>
<title>Transformations</title>
<statement><p>Given positive integers <m>m</m> and <m>n</m>, a <term>transformation from <m>\Reals^m</m> to <m>\Reals^n</m></term> is a function, <m>T</m>, that takes vectors of length <m>m</m> as inputs and returns vectors of length <m>n</m>. We write
<md>
<mrow>T:\Reals^m \longrightarrow \Reals^n</mrow>
<mrow>T(\vec{x}) \quad = \quad \vec{y},</mrow>
</md>
where the components of the vector <m>\vec{y}</m> will need to be specified in terms of the components of <m>\vec{x}</m>.
</p>
</statement>
</definition>
<definition>
<title>Domain of a transformations</title>
<statement><p>The <term>domain</term> of a transformation, <m>T</m> is denoted by <m>\Dom{T}</m> and is generally a subset of <m>\Reals^m</m> (provided <m>T</m> is defined as above).
<md>
<mrow> \Dom{T} \; = \; \{ \vec{x} \in \Reals^m \suchthat T(\vec{x}) \, \mbox{is defined} \} </mrow>
</md>
</p>
</statement>
</definition>
<definition>
<title>Co-domain of a transformations</title>
<statement><p>The <term>codomain</term> of a transformation, <m>T</m> is denoted by <m>\Cod{T}</m> and is equal to <m>\Reals^n</m> (provided <m>T</m> is defined as above).
</p>
</statement>
</definition>
<definition xml:id="linearity-1">
<title>Linearity</title>
<statement><p>A transformation <m>T</m> is <term>linear</term> if and only if given any two elements <m>\vec{u},\vec{v} \in \Dom{T}</m> and any two real numbers <m>\alpha</m> and <m>\beta</m> we have
<me> T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v}).</me>
</p>
</statement>
</definition>
<p>Linearity is a really important concept! We will be using the definition above over and over again. Let's try to nail down our understanding of this definition by translating it into ordinary language: A transformation is linear if and only if when you apply it to a linear combination of vectors, the result is equal to what you get if you form the same linear combination of the images of those vectors. More succinctly: <q>The image of a linear combination is the same linear combination of the images.</q> My advice (seriously!) is to treat that last phrasing like a mantra <mdash /> repeat it to yourself until you fully absorb the meaning and it becomes second nature to you.
</p>
<p>Look back at the formal definition of linearity, and notice what it looks like symbolically: It appears as if the transformation <m>T</m> distributes over the sum and that the scalars can be moved to the outside of the <m>T</m>'s.
Sometimes an alternative definition of linearity is given which splits out these two issues. This is sometimes useful in formulating a proof that some transformation is linear (because it separates the argument into simpler parts).
</p>
<definition xml:id="linearity-2">
<title>Linearity (alternate definition)</title>
<statement><p>A transformation <m>T</m> is <term>linear</term> if and only if given any two elements <m>\vec{u},\vec{v} \in \Dom{T}</m> and any real number <m>\alpha</m>, both of the following hold:
<me> T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}),</me>
and
<me> T(\alpha \vec{u}) = \alpha T(\vec{u}). </me>
</p>
</statement>
</definition>
<p>Before we can go any further we have a small moral obligation to take care of. Since we've just presented two definitions for a concept we have a duty to verify that they actually define the same concept. If we state that two things are the same, that really aren't, we're making a <term>false equivalence</term>. One of the hallmarks of a good critical thinker is that they won't be taken in by false equivalences. So, what do you think? Are they definitely the same idea, or are there transformations that are linear by one definition but not by the other?</p>
<theorem><title>The two definitions of linearity are equivalent</title>
<statement>
<p>Consider a given transformation <m>T</m> from <m>\Reals^m</m> to <m>\Reals^n</m>. Let <m>\vec{u}</m> and <m>\vec{v}</m> be arbitrary vectors in <m>\Reals^m</m>, also let <m>\alpha</m> and <m>\beta</m> be arbitrary real numbers. Then
<md>
<mrow> T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v})</mrow>
<intertext> if and only if </intertext>
<mrow> T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}) \quad \mbox{and} \quad T(\alpha \vec{u}) = \alpha T(\vec{u})</mrow>
</md>
</p>
</statement>
<proof>
<case direction="forward">
<p>In this part of the proof we will be presuming the first statement (the definition of linearity given first) and showing that the second statement must be true.
</p>
<p>Assume that <m>T</m> is a transformation and that for every pair of vectors <m>\vec{u}</m> and <m>\vec{v}</m>, and every pair of real numbers <m>\alpha</m> and <m>\beta</m>,
<me>T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v}).</me>
if we set <m>\alpha = \beta = 1</m> we get
<me>T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}).</me>
Similarly, if we leave <m>\alpha</m> arbitrary but set <m>\beta = 0</m> we get
<me>T(\alpha \vec{u}) = \alpha T(\vec{u}).</me>
</p>
</case>
<case direction="backward">
<p>In this part of the proof we will be working in the reverse direction, so we assume that both
<me>T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}) \quad \mbox{and} \quad T(\alpha \vec{u}) = \alpha T(\vec{u})</me> hold.
</p>
<blockquote>
<p>It's important to realize that the hypotheses we are using above are generic statements. When we write
<m>T(\alpha \vec{u}) = \alpha T(\vec{u})</m> the scalar <m>\alpha</m> and the vector <m>\vec{u}</m> are really beside the point. We are really asserting a general rule about how <m>T</m> interacts with scaled vectors
<mdash /> any other scalar times any other vector will work the same way. So for example, that hypothesis will also let us deduce that <me>T(\beta \vec{v}) = \beta T(\vec{v}).</me>
</p>
</blockquote>
<p>Consider <m>T(\alpha \vec{u} + \beta \vec{v})</m>. Using our first hypothesis (the one that shows how <m>T</m> distributes over sums) we get
<me>T(\alpha \vec{u} + \beta \vec{v}) \; = \; T(\alpha \vec{u}) + T(\beta \vec{v})</me>.
Using the second hypothesis (twice) we get
<me>T(\alpha \vec{u}) + T(\beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v}).</me>
Finally, putting these pieces together we have
<me>T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v})</me>
which is the desired result.
</p>
</case>
</proof>
</theorem>
<definition>
<title>Linear transformations</title>
<statement><p>Given positive integers <m>m</m> and <m>n</m>, a <term>linear transformation from <m>\Reals^m</m> to <m>\Reals^n</m></term> is a transformation <m>T</m>, that takes vectors of length <m>m</m> as inputs and returns vectors of length <m>n</m> and that is <em>linear</em>. We write
<md>
<mrow>T:\Reals^m \longrightarrow \Reals^n</mrow>
<mrow>T(\vec{x}) \quad = \quad \vec{y},</mrow>
</md>
where the components of the vector <m>\vec{y}</m> will need to be specified in terms of the components of <m>\vec{x}</m>.
</p>
</statement>
</definition>
<p>There is an interesting connection between our use of the word <q>linear</q> in talking about linear transformations
and linear combinations. When a transformation is linear the functions that determine the output's components in terms of the input's components must <em>be</em> linear combinations. And <foreign>vice versa</foreign>, if the component functions are linear combinations then the transformation will be linear.
</p>
<p>The content of the previous paragraph may not be surprising from a linguistic perspective; they wouldn't use the same word if the underlying concepts were really different, would they? From a mathematical perspective it's a bit less obvious. Indeed this is the sort of thing that mathematicians call a <em>theorem</em>. We'll state this theorem now, but we'll leave the proof to a later chapter.
</p>
<theorem><title>coefficients of a linear transformation</title>
<statement><p>Given a transformation <m>T: \Reals^m \longrightarrow \Reals^n</m>, <m>T</m> is linear if and only if,
for all input vectors <m>\vec{x}</m> the components of <m>T(\vec{x})</m> can be expressed as particular linear combinations of the components of <m>\vec{x}</m>.
</p>
</statement>
</theorem>
<p>In order to fully specify a linear transformation we need to give values for all of the constants that are used in the linear combinations where the <m>y_i</m>'s are written in terms of the <m>x_i</m>'s. For each of the <m>n</m> components of <m>\vec{y}</m>, we will need <m>m</m> numbers (as many as there are components in <m>\vec{x}</m>). In other words we must specify <m>mn</m> constants.</p>
<definition>
<title>components of a linear transformation</title>
<statement><p>Given <m>mn</m> real numbers, <m>\aij{1}{1}, \ldots \aij{m}{n}</m>, we say they are the components of a
linear transformation <m>T</m>,
<md>
<mrow>T:\Reals^m \longrightarrow \Reals^n</mrow>
<mrow>T(\vec{x}) \quad = \quad \vec{y},</mrow>
</md>
provided
<md>
<mrow> y_1 = \aij{1}{1} x_1 + \ldots + \aij{1}{m} x_m </mrow>
<mrow> \vdots </mrow>
<mrow> y_n = \aij{n}{1} x_1 + \ldots + \aij{n}{m} x_m .</mrow>
</md>
</p>
</statement>
</definition>
</section>
<section>
<title>Matrix notation</title>
<p>The three seemingly distinct viewpoints we've considered are unified by the concept of a <term>matrix</term>.</p>
<p>The word <q>matrix</q> is from Latin. The word entered the English language with a variety of meanings <mdash /> in Latin it means <em>womb</em>. In mathematics, matrix (pl. matrices) always means a table containing numerical values. It is rather hard to guess how a word meaning <q>uterus</q> could get morphed into one meaning <q>table of numbers</q>, but languages are funny that way<ellipsis />
</p>
<p>Generally speaking, a table of numbers will have some arbitrary number of rows and of columns. There are some special cases that we'll need to talk about, but let's look at the general situation first. We'll use the variable <m>m</m> to refer to the number of rows in a matrix and the variable <m>n</m> to refer to the number of columns. We'll use upper-case letters (about 90% of the time: <m>A</m>) to refer to the whole table as a single entity, in which case we'll speak of <m>A</m> being an <m>m \times n</m> matrix. The entries of a matrix will usually be denoted using the corresponding lower-case letter with <em>two</em> subscripts. This is (hopefully) reminiscent of the doubly-indexed quantities we saw near the end of Section <xref ref="section-transformations" />; the components of a linear transformation.
</p>
<example><title>matrix notation</title>
<p>Here are a couple of matrices:
<me> A = \left[ \begin{array}{ccc} 1 \amp 4 \amp 9 \\ 7 \amp \pi \amp 42 \end{array} \right] \quad \mbox{and} \quad B = \left[ \begin{array}{cc} -1 \amp 11 \\ -3 \amp e \end{array} \right]</me>
</p>
<p>Notice how we are referring to the entire tables with the variables <m>A</m> and <m>B</m>? If we need to refer to the individual entries of a matrix we'll write things like <m>\aij{2}{3} = 42</m> (the number in the 2nd row and 3rd column of <m>A</m> is 42), or <m>\bij{1}{2} = 11</m> (the number in the 1st row, 2nd column of <m>B</m> is 11).
</p>
<p>It's also fairly common to ignore this lower-case convention! That is, you may also see things like <m>A_{1\:\!3} = 9</m> and <m>B_{2\:\!2} = e</m>.
</p>
</example>
<p>Now to the special cases. When the number of columns is <m>n=1</m>, the matrix is known as a <term>column vector</term>. When the number of rows is <m>m=1</m>, the matrix is known as a <term>row vector</term>. There is clearly a choice to be made as to whether the things we have been referring to as (merely) <q>vectors</q> are going to be represented as column vectors, or as row vectors. Here's a surprising thing! Your Calculus teachers and I (up until now) have been lieing to you. When we wrote vectors as (for example) <m>\vec{v} = \langle 1, 2, 3 \rangle</m>, it was only for convenience. A row of numbers fits more easily on the page than a column does. For a variety of reasons it makes sense to treat vectors as columns of numbers, not rows.
</p>
<p>There is an operation known as <term>transposition</term> that changes row vectors into column vectors and <foreign>vice versa</foreign>. The <term>transpose</term> of a matrix is indicated by a superscript T, the rows of the transposed matrix are the columns of the original matrix and its columns are the original matrix's rows. This idea (interchanging rows and columns) is surprisingly important and we'll be using it quite a bit in the future. For the moment let's just notice that it gives us a nice way to write a column vector <mdash /> with the typographical advantage that the components appear in a row!
</p>
<p>To summarize what the last few paragraphs have said: It is technically not right to write <m>\vec{v} = \langle 1, 2, 3 \rangle</m>, we should really write <m>\vec{v} = \left[ \begin{array}{c} 1 \\ 2\\ 3 \end{array} \right]</m>, but that takes up too much vertical space so instead we write <m>\vec{v} = [ 1 \; 2 \; 3 ]^T</m>. This may all seem like too high of a price to pay for accuracy, but it will pay future dividends if we start thinking now about rows and columns and how to switch between them.
</p>
<p>If we only had row and column vectors to worry about we'd probably find some other way to distinguish them <mdash /> maybe there'd be red vectors and blue vectors!
</p>
<note>
<p>In Physics (especially in the Tensor Analysis which is used in e.g. General Relativity) they distinguish between covariant and contravariant indices. An entity with a single contravariant index is a vector, if instead there is a single covariant index it is known as a co-vector. These concepts aren't identical to row/column vectors, but nevertheless, contravariant vectors are usually written as columns and covariant vectors as rows.</p>
</note>
<p>By convention there is no need to refer to the entries of a row or column vector using double indices <mdash /> one of them would always be 1 so we can omit it. When we have more general matrices, where <m>m</m> and <m>n</m> are both greater than <m>1</m>, the roles of rows and columns are more evident and two indices will be necessary to refer to the entries.
</p>
<p>One useful way to think about matrices is the following: When we write down a system of equations, a lot of the symbols that we write are redundant. If we eliminate all of the stuff that is utterly predictable we are left with a table of numbers <mdash /> in other words, a matrix. So one way to think of matrices is that they are highly abbreviated ways of referring to a system of linear equations. In this scheme the rows of the matrix correspond to the individual equations in the system and the columns contain all the
coefficients that multiply a given variable. A short example will probably help:
</p>
<example><title>Converting a linear system to matrix form</title>
<statement>
<p>
Consider the following system of <m>3</m> equations in <m>4</m> variables.
<me>
\begin{alignedat}{8}
x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}+{} \amp 3 x_4 \amp {}={} \amp 101 \\
2 x_1 \amp {}-{} \amp x_2 \amp {}+{} \amp x_3 \amp {}+{} \amp x_4 \amp {}={} \amp 102 \\
\amp \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}+{} \amp 2x_4 \amp {}={} \amp 103
\end{alignedat}
</me>
</p>
<p>Now we'll take one step backwards before proceeding two steps forward. If a variable appears, but has no coefficient, that just means the coefficient is <m>1</m>. If a variable doesn't appear at all, that means the coefficient is <m>0</m>.
Finally, if we see subtraction we can always replace it by addition (by putting a minus sign on the coefficient). So, let's re-express this system in a fully anal-retentive way<ellipsis />