-
Notifications
You must be signed in to change notification settings - Fork 0
/
project_thesis.txt
executable file
·15010 lines (13147 loc) · 761 KB
/
project_thesis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Asking About Data
Exploring Different Realities of Data via the
Social Data Flow Network Methodology
Brian Ballsun-Stanton
The University of New South Wales
THE UNIVERSITY OF NEW SOUTH WALES
Thesis/Dissertation Sheet
Surname or Family name: Ballsun-Stanton
First name: Brian
Other name/s:
Abbreviation for degree as given in the University calendar:
PhD
School: Humanities
Faculty: Arts and Social Sciences
Title: Asking About Data Exploring Different Realities of Data via the Social Data Flow Network Methodology
Abstract
What is data? That question is the fundamental investigation of this dissertation. I have developed a
methodology from social-scientific processes to explore how different people understand the concept of data, rather than to rely on my own philosophical intuitions or thought experiments about
the “nature” of data. The evidence I have gathered as to different individuals' constructions of data
can be used to inform further inquiry of data and the design of information systems.
My research demonstrates that people have different constructions of data. The methodology of the
SDFN, created for this dissertation, has proven able to probe those understandings. The SDFN,
loosely based on a DFD and combined with ideas from SNA, provides a way of discovering practical
definitions of hard-to-operationalize terms like data. The process of repeatedly categorizing various
items as data allows the methodology to explore how participants actually use the term, rather than
relying on theoretical dictionary-based definitions.
Analysis of the interviews found three different constructions of data: data as communications, a
container for meaning; data as subjective observations, sense-impressions filtered by knowledge;
and data as objective facts, measurements revealing the relationships of reality.
Declaration relating to disposition of project thesis/dissertation
I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in
part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property
rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.
I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International.
……………………………………………………………
Signature
……………………………………..………………
Witness
……….……………………...…….…
Date
The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction
for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and
require the approval of the Dean of Graduate Research.
FOR OFFICE USE ONLY
Date of completion of requirements for Award:
ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my
knowledge it contains no materials previously published or written by another
person, or substantial proportions of material which have been accepted for the
award of any other degree or diploma at UNSW or any other educational
institution, except where due acknowledgement is made in the thesis. Any
contribution made to the research by others, with whom I have worked at
UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that
the intellectual content of this thesis is the product of my own work, except to
the extent that assistance from others in the project's design and conception or
in style, presentation and linguistic expression is acknowledged.’
Signed ……………………………………………..............
Date
……………………………………………..............
Asking About Data
Exploring Different Realities of Data via the
Social Data Flow Network Methodology
By: Brian Ballsun-Stanton
A thesis submitted for the degree of
DOCTOR OF PHILOSOPHY
School of Humanities
March 2012
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
License. For more details go to: http://creativecommons.org/licenses/by-nc-sa/3.0/
Contents
1
Acknowledgements
1.1
Figures
2
Abbreviations
3
Abstract
4
Introduction
4.1
Methodological Summary
4.2
Analysis and Results
4.2.1
Questions of Interest and the Methodology of Analysis
4.2.1.1
Question of interest 1: do people have different realities of data?
4.2.1.2
Question of interest 2: can my methodology probe people’s
realities of data?
4.3
Interview Analysis
4.3.1
Result 1: Data as communications
4.3.2
Result 2: Data as subjective observations
4.3.3
Result 3: Data as measured facts
5
Literature Review
5.1
Hierarchies of Data
5.1.1
Ackoff’s Traditional Hierarchy
5.1.2
Tuomi’s Cyclic Hierarchy
5.2
Links with the Literature
5.2.1
Zins’ concepts
5.2.2
Trading Zone
5.2.3
Evaluative accents
5.3
Justification
5.4
Aims
6
Methodology
6.1
The Social Data Flow Network
6.1.1
Terms
6.1.1.1
Entity
6.1.1.2
Flow
6.1.1.3
Entity Dictionary
6.1.2
Creating the SDFN
6.1.3
Running an Interview
6.1.4
Timing Considerations
6.2
Conducting the Interview
6.2.1
Introduction
6.2.2
SDFN Building
6.2.2.1
Methodology of the SDFN
6.2.3
Theoretical Discussion
6.3
Survey
6.3.1
Tools and Techniques of the Survey
7
Interviews
7.1
Interview 1
1
1
3
5
7
8
14
14
15
16
17
18
18
19
21
26
27
28
30
30
32
34
37
39
43
44
44
44
47
51
52
57
58
60
60
62
63
66
71
76
81
82
7.1.1
Collected Drawings
7.1.2
Annotated Transcript
7.1.3
Personal Reflection
7.2
Interview 2
7.3
Interview 3
7.4
Interview 4
7.5
Interview 5
7.6
Interview 6
7.7
Interview 7
7.8
Interview 8
7.9
Interview 9
7.10 Interview 10
8
Personal Reflections
8.1
Summary of Interview Analysis
8.2
Summary of Survey Analysis
8.3
Interviews
8.3.1
Data as Communications
8.3.1.1
Interview I
8.3.1.2
Interview II
8.3.2
Data as Subjective Observations
8.3.2.1
Interview III
8.3.2.2
Interview IV
8.3.3
Data as Facts
8.3.3.1
Interview V
8.3.3.2
Interview VI
8.3.3.3
Interview VII
8.3.3.4
Interview VIII
8.3.3.5
Interview IX
8.3.3.6
Interview X
8.3.4
Reflection Conclusion
8.4
Surveys
8.4.1
Survey I
8.4.2
Survey II
8.4.3
Survey III
8.4.4
Survey IV
8.4.5
Survey V
8.4.6
Survey VI
8.4.7
Survey VII
8.4.8
Survey VIII
8.4.9
Survey IX
8.4.10
Survey X
8.4.11
Survey XI
8.4.12
Survey XII
8.4.13
Survey XIII
8.4.14
Survey XIV
8.4.15
Survey XV
8.5
Survey Analysis
9
Different Realities of Data and the Database
9.1
Case Study
82
82
95
97
112
129
159
185
202
219
228
243
269
270
272
274
275
275
280
281
282
284
287
287
289
291
292
294
296
300
300
303
305
307
309
310
312
313
315
317
319
320
321
323
324
325
328
331
332
9.2
System Models and Mental Maps
9.3
The Consequences of Error
9.4
Database as Corporate Mind
10 Conclusion
10.1 Results
10.2 Methodological conclusions
10.3 Further Research
10.4 Final reflection
11 Appendix A: Survey Text
12 Appendix B: Graphviz
12.1 pretty.gv
12.2 general.dot
12.2.1
Terminal Command
12.2.2
Output
13 Appendix C: Flyer
14 Appendix D: Asking About Data
15 References
15.1 In-Text Citations
15.2 Works Consulted
339
341
344
349
349
351
352
353
355
385
385
387
389
390
391
393
395
396
400
1 Acknowledgements
I thank the employees of BlueScope Steel in Australia for participating in my
survey and answering difficult questions about how they thought about the
world. Their participation has made my research possible.
I appreciate the members of the Pentagon’s INTELST mailing list for their
participation in the survey and Kevin DiVico for putting me into contact with
them and his assistance in other matters.
I extend a significant deal of gratitude to my present and past advisers, Dr.
Anthony Corones and Dr. Deborah Bunker. Without their expert guidance and
assistance, I would not have had a dissertation of this quality.
I sincerely thank John Rennie for his expert editing assistance, lending a final
polish to this work.
Lucy: "Coffee on Sunday?"
Mikhail: I can intone only: "Ia! Ia! Cthulhu fhtagn! Ph’nglui mglw’nafh
Cthulhu R’lyeh wgah-nagl fhtaga -"
To the rest of my friends: thank you for keeping me sane-ish during this three
year ordeal. Without their time, friendship, and support, I would never have
completed this work.
1.1 Figures
The title background is a heavily photoshopped version of Jim Sanborn’s Kryptos statue. The image was
provided to the Wikimedia Commons under a Creative Commons CC-BY-SA 3.0 license. As this work
is also licensed under a CC-BY-SA-NC license, it satisfies all requirements. Original image available at:
http://en.wikipedia.org/wiki/File:Kryptos01_1.jpg
The other external image I use, figure 4.1, is also CC licensed, used by permission of the author,
Daniel P. Lee. http://www.flickr.com/photos/yankeeincanada/3658159896/
All other images were produced by myself for this dissertation.
1
2
2 Abbreviations
DFD
Data Flow Diagram
HCI
Human-Computer Interaction
IST
Information Systems & Technology
SDFN
Social Data Flow Network
SNA
Social Network Analysis
UoD
Universe of Discourse
3
4
3 Abstract
What is data? That question is the fundamental investigation of this dissertation. I have
developed a methodology from social-scientific processes to explore how different people
understand the concept of data, rather than to rely on my own philosophical intuitions
or thought experiments about the “nature” of data. The evidence I have gathered as to
different individuals’ constructions of data can be used to inform further inquiry of data
and the design of information systems.
My research demonstrates that people have different constructions of data. The
methodology of the Social Data Flow Network, created for this dissertation, has proven
able to probe those understandings. The Social Data Flow Network, loosely based on a
Data Flow Diagram and combined with ideas from Social Network Analysis, provides a way
of discovering practical definitions of hard-to-operationalize terms like data. The process
of repeatedly categorizing various items as data allows the methodology to explore how
participants actually use the term, rather than relying on theoretical dictionary-based
definitions.
Analysis of the interviews found three different constructions of data: data as
communications, a container for meaning; data as subjective observations, sense-impressions filtered by knowledge; and data as objective facts, measurements revealing the
relationships of reality *.
*
For a longer summary of this research, look at Appendix D. The peer-reviewed paper on page 393 was
presented at the IEEE 5th International Conference on Computer Sciences and Convergence Information
Technology in Seoul, Korea during the process of writing the thesis.
5
6
4 Introduction
In Information Systems & Technology studies (IST), I have noticed that practitioners
use and understand the term “data” differently than the people they are helping. The
purpose of this research is to explore the different conceptions of data that may exist
beyond the domain of IST and demonstrate a methodology that allows practitioners to
access the conceptions of data present in their workplace.
Exploring a conception of data is fundamentally a philosophical problem. A person’s conception of data stems from the affordances they attach to it, their belief in
its underlying qualities, and their differentiation between data and non-data. However,
this philosophical problem cannot be solved through intuition alone: a methodology is
necessary to extract a person’s conception of data.
These individual conceptions can then be formalised as “philosophies of data.” By
‘philosophies’ we mean answers to the questions like, ‘What is data?’, ‘What is data for?,
‘How do I know the data is reliable?’, and ‘What are the properties of data?’ While individuals may not “have philosophies,” understanding that individuals engage philosophically
with their conceptions of data allows the creation of a tool to probe those philosophical conceptions of data in a workplace. By probing conceptions, the IST practitioner
effectively uncovers de facto philosophies of data in individuals.
This research, however, does not propose to uncover fundamental philosophies
of data, only some common conceptions of data that may exist in workplaces. These
different conceptions of data can produce frustration, error, and miscommunication if
people with different conceptions interact unknowingly. Conceptions of data include
context, reliability, constraints as to its nature (can it be a description, must it be a
7
number), the means of collection, and the means of manipulation.
I have created a methodology called the Social Data Flow Network (SDFN). This interview technique has elicited people’s conceptions of data *, demonstrating three different
conceptions within a particular industrial research workplace. A survey developed from
the SDFN technique hints that there may be different conceptions of data present in the
intelligence analysis community and the IST practitioner community.
It is my hope that IST practitioners can use the SDFN I have developed to make better
interfaces and databases: through the understanding of a client’s expectations of data, the
system can provide natural interaction methods that conform to the client’s expectations
of what data is and is not. The SDFN might also be used within an organization to reduce
miscommunication and error: the explicit definition of one particular conception of data
for a workplace.
4.1 Methodological Summary
The primary result of this thesis is the methodology of the Social Data Flow Network.
The SDFN uses repeated categorization to explore how individuals group informational
or communicative flows into categories. By eliciting categorizes that focus on data,
information, and knowledge, the participants use the categorization to operationalize
their epistemological understanding of data: they indicate what is and is not data and
how it becomes information and knowledge. This elicitation helps both the interviewer
and the participant to discover their own situational conceptualization of data.
The repeated categorization allows participants to generate and resolve cognitive
dissonance situated around the differences between their theoretical definitions of data
*
their de facto philosophical approaches towards knowing that something is or is not data
8
and their practical uses and categorizes of data. In interviews, participants demonstrated
a refined understanding of their own conceptions of data at the end of the interview,
catalyzed through their participation in the SDFN.
The SDFN involves the articulation of roles as entities, descriptions of content
flows between those entities and the categorization of those flows as data, information,
knowledge, or other. Participants iterate over a task domain defined at the start of the
interview, discussing all the entities and flows between those entities involved in the
task. The interview concludes with an opportunity for the participant to self reflect on
their “philosophy” of data, discussing what they categorize data as and how it becomes
information and/or knowledge.
A scenario based survey, inspired by the SDFN was also trialled with less satisfactory
results. While the survey did demonstrate that intelligence officers, IST professionals,
and other industrial research employees did have different conceptions of data, it did
not do so with any statistical rigor nor with the depth of discussion that the interviews
provided.
The SDFN combines two concepts for a novel purpose. It is a graph * that combines
*
A graph, strictly speaking, is any diagram that contains edges and nodes. A node is the component of
a graph that is a point. The point can be labeled or unlabeled. The node is the element of the graph
that is a representation of a thing. Sometimes the thing being represented is a computer or a person,
or a place, but in any event the node represents a noun. Edges on the other hand are the relationships
or connections between nodes. An edge represents a “flow” of action or stuff between nodes. Edges
traditionally have served as network links, roads, phone lines, and simple representations of adjacency. A
graph is a non-topological method of representing the relationships between entities through edges and
nodes.
Edges can be directed: they show a flow or relationship from one node to another. The direction on the
edge indicates the direction of relationship. For example, consider Alice and Bob. To represent Alice
sending a letter to Bob, we would make both of them nodes and draw a directed edge from Alice to Bob
indicating the one way flow of the letter. By adding the concept of directionality to edges, a causal element
is introduced to the representation-specifically, that the originating node causes a relationship to the
recipient node. This addition of causality then precipitates the idea of connectiveness.
A node may or may not be reachable by other nodes. A graph or subgraph where every node can be reached
from every other node is called a strongly connected graph. A graph where that’s not true is weakly
9
the idea of the social network with that of the data flow diagram. In social network analysis, it is possible to represent interactions between people, a social network, through
graphs. Each node on a graph represents a person and each edge represents some sort
of connection between people, as a function of the interactions of interest to the researcher [1].
The Data Flow Diagram [2 and 3] contributes its diagrams to the SDFN. A DFD originally was designed for structured programming. The document produced by the DFD
would combine the delineation of a universe of discourse via the context diagram with
the highly precise definition of flows into and out of that diagram. A Universe of Discourse (UoD) [4] is the term used for defining the topic under consideration. Everything
within the UoD is relevant and must be modeled. Everything outside the UoD is irrelevant.
Interestingly, as the DFD was repurposed for business modeling, the UoD remained the
same: it is still asking, “What bit of reality do we care about right now?”
The DFD would then be refined through a process of “zooming in” on that context
diagram to expose the transformations required to produce the outputs from the inputs.
Each additional level would seek to conserve inputs and outputs, and thereby produce a
diagram that could be mapped to the functions and variables necessary for a structured
program.
The DFD contributes great ideas to the SDFN. It contributes the idea that data is
something that can be modeled. The conception of data embodied by the DFD is that the
modeler can translate reality into data-as-bits and that data could be described through
text. All actions in the data flow diagram are considered either flows or transformation.
Data flows from sources through transformations, and out into sinks. The sources and
connected. When we apply the idea of strongly connected graphs to social networks, we can identify
small groups by identifying strongly connected subgraphs within a larger, weakly connected graph.
10
sinks are entities outside the scope of the diagram. By decomposing these transformations into ever simpler and more detailed sets of sub-transformations, modelers could
design an entire software system intended to process and transform data. The modeler
acts as translator: taking the described reality by the client and forcing it into a computerized mold. Repurposing the methodology of the DFD by subtracting the modeler’s
translation suggests that it might be possible to use my method to probe and document
a client’s subjective reality.
The DFD also contributes an iterative structure for the definition of reality. The
iterative techniques explore the UoD in order of increasing specificity from the vague
context diagram describing the universe of discourse to highly detailed sub-sub-sub (etc.)
transformations required deep in the diagram. By starting with broad generalizations,
the DFD insured that the client was thinking about the whole task and did not immediately
become fixated on any one aspect. With the DFD iterating across each declared “transformation” and decomposing it, the details of each transformation were both evoked and
then situated in the scaffolding of the broader context. The requirement to conserve
inputs and outputs eliminated any question of missing aspects of the diagram or other
design-based blind alleys. The idea of iterative exploration and definition is extremely
valuable to the SDFN.
The Social Network Graph provides the concept of a social network * to the SDFN.
The Social Network Graph also contributes a novel idea about the scope of edges. Edges
in the DFD were simple flows of data, representing the movement of trivial signs. In the
social network graph, edges can be individual communications, orders, relationships,
*
A social network graph is a mapping of a person’s relationships with other people into non-topological
graph format. Each relationship is a directed edge; each person, a node. The social network graph is used
in many different fields: communications, social media, and sociology are some of them. In many ways,
the idea of the social network graph is strongly related to the ideas of actor-network theory [5].
11
and objects. The huge diversity of edge types suggested by a social network graph, when
combined with the DFD, ruins the DFD for its original purpose: the modeling of software
systems. However, they also suggest different possible models that can be applied to the
DFD format.
In communicative analysis, social network graphs are used for linguistic analysis *.
It is possible to explore the control structures of a group by noting, with an edge, who is
talking to whom. By exploring the frequency and directionality of those notes, analysts
gain insights into the power and influence roles of social networks. As such, the “thought
leaders” of the small group can be identified.
Moreover, by graphing flows of communication, it is possible to identify small
groups within larger groups, as these small groups will communicate strongly between
each other and vaguely to nodes outside. In other circles, this behavior is known as
siloing [8]. One design intent of the SDFN is to confer the ability to identify siloing. By
rendering flows between members of an organization, it should be possible to identify
strongly connected sub-graphs, which suggest communicative silos within that organization.
The social network graph contribution alters the diagramming rules of the DFD.
Social network entities can be any actor that participates in a communication. The SDFN
is a diagram exploring flows of data between actors, instead of flows between transformations. By creating a web of affiliation [9] between these entities, it should be possible
to describe the communicative realities that an individual perceives. It should therefore
be possible to explore how they understand the nature of data by exploring how they
*
figure 4.1 provides a trivial example of linguistic analysis as applied to a set of twitter replies during a
conference. The different line weights are used to denote quantity of communications along a radially
distributes set of nodes. Other approaches can be far more complex, looking at patterns beyond simple
frequency [6 and 7].
12
Figure 4.1: Social network graph of #sla2009 tweet replies to June 19, 2009
“The thicker the line, the more times you sent an @reply to that person. The
more lines you have, the more @replies to different people you sent. If you
don’t appear on the graph, but know that you sent out @replies, it’s because
the person you sent your @reply to never sent out an @reply and so that
person won’t appear on the graph and unfortunately, you can’t either! Interestingly, a few people only sent replies to themselves, so they do appear on
the graph as a line that goes back to themselves.” -Image used with permission, created by: Daniel P. Lee, MLIS.
describe its movement from entity to entity in the SDFN.
Despite the terminology of actors, and the use of a social network, my research
13
does not yet incorporate actor-network theory [10]. While Latour’s work offers many
useful ideas for understanding the world, it still imposes a framework from which biases
may be imparted. Therefore, while I do not use actor-network theory here, it may be
useful in later research exploring the implications of held philosophies on Latour’s work.
The SDFN does not try to be explanatory, comprehensive, or objective. The point of
the SDFN is to reveal part of how the participant understands a concept, not to build upon
that understanding nor transform it into a model for a computer system. Consequently,
no design provisions in the methodology allow two or more peoples’ categories to be
reconciled. More work will be necessary before the SDFN can be used directly as a design
methodology.
4.2 Analysis and Results
These questions of interest are posted to the reader to keep in mind in the results section.
My personal analysis, presented after the “raw data,” uses these questions of interest as
framing devices for my reflections on the individual interviews.
4.2.1 Questions of Interest and the Methodology of Analysis
My “hypotheses” are described as questions of interest to reflect the rapid iterative nature
of abductive explorations. They provide research directions that act as broad guides
to the formation of a universe of discourse for future research rather than predictive
statements about reality.
The intent of the questions is to frame analyses and guide it towards useful and
interesting areas. We need to consider how the evidence relates to these questions of
interest.
14
Each interview, after transcription, was subjected to recursive analysis for my personal reflections on the interviews. I summarized six to ten lines of each interview in a
one-line summary. Then between three and six summaries were summarized, filtering
for statements about the user’s conception of data. Although self-transcription transmits personal bias, two significant factors prevent a traditional double-blind study. An
untested methodology is no place for the mass utilization of volunteer interviewers. The
limited scope allowed me to retain control of the interview process and to provide for
the best possible interviews for each participant while retaining the basics of the SDFN.
Because I conducted each interview, the bias would have already been introduced; providing for pious-sounding human coding would have lent false reliability to something
inherently subjective.
My personal reflections are very simple. I have tried to extract each participant’s
intuitions about data from the recursive analysis.
4.2.1.1 Question of interest 1: do people have different realities of data?
If this research produces nothing else, it must investigate whether people have different
conceptions of data. This idea was the central intuition that prompted this research, and
its testing will demonstrate whether or not there is anything to my intuition.
As the organizing factor of my analysis, this question of interest will focus my
activities. It will justify further research on the nature and subjective constructions of
data from my experimental results, or else its demonstrable failure will justify not doing
so.
The question “Do people have different realities of data?” defines an overly large
universe of discourse, one impossible to study at a useful level of granularity in one
research project. The very breadth of the question precludes the determination of any
15
useful and specific facts about the world besides simple exploration of the assertion that
people have different understandings of data. The intent of this research and of this question is to generate interest in the research of the nature of data, how people understand
it, and to demonstrate that there are potential areas of philosophy to research.
I want to see if, beyond my intuitive insight, people actually have different conceptions of data or if my perception of different conceptions is an artifact of the requirementsgathering process of designing a database. It is therefore not sufficient to state that people have different understandings of data depending on whether they are dealing with it
in a technical or scientific context. We must look for evidence.
This question of interest, in its reach, is not ambitious. It suggests no predictions
about peoples’ conceptions of data, how they act with different realities of data, or any
other fact about the world. Instead, it simply directs us to see if there is anything of
interest for further explorations.
4.2.1.2 Question of interest 2: can my methodology probe people’s realities of data?
My methodology has a simple job: to assess what people mean when they use the term
“data.” This question of interest is designed as a sanity check. I am investigating a new
idea with an untested methodology. It is vital to consider that the success or failure
of Question of interest 1 is directly modulated by the success or failure of Question of
interest 2. Therefore, the methodology itself deserves distinct analysis.
The methodology should be of use to more people than just those investigating
peoples’ conceptions of data. If the methodology is useful and judged to add value to
Question of interest 1, analysis of the methodology should indicate whether other people
16
could use it to investigate matters of interest to them.
Question of interest 2 is asking: do these results make sense? Sense-making is a
matter of internal and external consistency. This question should force me to explore
whether the SDFN correlates with interview results and whether the types of results make
sense relative to the survey.
Beyond consistency, I must also ask: Is it possible to get these results from this
methodology? In this case, I need to make sure that I am not reading imaginary meaning
in the tea leaves of the results. Because this kind of external self-reflection is difficult, the
question must be simplified to: Do the results surprise me? If they do not have elements
of surprise, then the probability that I am projecting meaning into them must be strongly
considered.
All of these are very self-critical questions, as they must be to explore the impact of
an untested methodology. I am trying to consider whether my methodology can present
a persuasive story, and if it can, does it?
4.3 Interview Analysis
My interview analysis discovered three different conceptions of data. It would be hard
to deny that interviews I and II have data as communication, III and IV have subjective
observations (with IX hinting at them) and the rest considering data as objective fact.
With these broad differences evident, I feel question of interest 1 has been satisfied.
The observation constructions differ strikingly from the numeric constructions,
possibly differing on a fundamental perception of reality. As one interview is trying to
render the relationships between matter in the world as numbers (objectivist), another
is suggesting that everything emits data and we must filter it. The conflict is records
versus measurements versus signs. Does data measure objective reality, record subjective
17
reality, or merely transmit signs? Numbers are seen as a result of precision most of the
time, whereas observations are building their way towards knowledge.
4.3.1 Result 1: Data as communications
Data, in the communicative sense, merely requires signs and things to communicate with
those signs. The data can be rendered as bits or marks on paper, but it is seen as a factor
of semiotic import rather than as something to be discovered or filtered.
This construction is substantively different from the other two inasmuch as it does
not uphold data to be an aspect of reality. Instead, data is produced as a function of
human intent. Because this understanding does not concern itself with interactions of
the real, there is a far greater difference between this and the other two than between
the subjective-objective constructions. However, the passivity of this construction allows
it to accept facts produced from either source as something to be encoded, stored, and
transmitted. Significant research needs to be done to explore how this construction of
data relates to the other two.
4.3.2 Result 2: Data as subjective observations
Data, in the subjective observations, requires contextualization and filtering. Everything
emits data as sense impressions * that can be captured by us. Thus, to perform sensemaking activities, we must filter and contextualize the interesting data so that it can
become information.
*
Like the ancient Aristotelian idea of species (particles of sensation). Light was the medium that visual
species traveled within.
While this ancient philosophy of image is not hugely useful to us, the same intuitions that led to it could
have some parallels with data as subjective observations. This research area could make an interesting
bridge between intuitive and experimental philosophies.
18
Subjective data lends itself more to cyclic hierarchies, where data begets the information and knowledge used to collect more data, reflecting an interestingly constructivist view of knowledge. There is quite a lot here available to future research, and I do
not feel sufficiently confident in my sample size to make any assertions as to relationships between data and the various philosophies of knowledge or science, though the
subjective nature of observations may tend slightly more towards Latour or Feyerabend.
Of more interest is that this inherently subjective data is constructed from the
mind’s impressions of the surroundings, rather than revealed through measurement
of the surroundings. The understanding of the embodiment of data is a significant
difference between the two understandings of data.
4.3.3 Result 3: Data as measured facts
Objective data comes with its own context “baked in.” It is, in many ways, rare: it requires
positive effort to generate, and higher quality data requires a commensurate increase in
effort. Data requires analysis to uncover the extant patterns of reality, and with enough
data, knowledge about the singular real can be generated.
Objective data requires that data be a fact, usually a numerical, reproducible representation of reality that conveys an understanding of measurement quality and units.
Objective data is not filtered, because it is collected with prior intent and all elements of
the “data set” may produce interesting patterns.
Both humans and sensors can reveal objective data, which is embodied in the things
being measured. There seems to be no significant link with any of the major philosophies
of science. Although my investigations did not explore confirmation, falsifiability, or
paradigms, there seems to be a common understanding that data-as-fact accurately
19
represents the universe within the constraints of measurement. This may be because
the participants believed data to be a building block upon which their hypotheses or
understanding of the universe could be built.
20
5 Literature Review
The aim of this thesis is to ultimately facilitate better workplace communication, user
interfaces, and database design and management. In order to do that, I borrow heavily from concept elicitation methodologies in order to produce personal constructs of
data. These personal constructs of data, rendered in a concept map, allow for explicit
exposition of the concept of data in a workplace and thereby reduce miscommunication
through self-aware modification to available mental maps of the purpose and role of
data.
Concept elicitation methodologies are a subset of knowledge elicitation methods, a
tool used in many disciplines to “obtain the information needed to solve problems” [11].
Knowledge elicitation, in the main, is focused on direct problem solving: exploring
requirements and understanding the meanings of those requirements. However, by turning the techniques of knowledge elicitation onto epistemological questions of category,
we can discover not the direct meaning behind requirements, but some of a person’s
semiotic models of the constructions behind those requirements.
My research looks to investigate a person’s personal construction of data. I borrow
from data flow diagrams with a similar intent to the RepGrid methodology, though the
end product differs significantly. The idea of personal constructs, discussed by Kelly
and Tan [12 and 13] and reformulated under many names: Terms that have been used
to describe these things include “schemas” [14 and 15] “cognitive maps” [16 and 17],
“technological frames” [18], and “mental models” [19]. I, like Tan, will use personal
constructions as the operative term.
Kelly [12] describes a personal construction as a combination of philosophy and
21
psychology. A construct, being subjective, is a personal epistemological tool of categorization and differentiation: “A construct is a way in which some things are construed
as being alike and yet different from others.” His thesis denotes constructs as framing
devices where we can situate objects-as-signs in our way of knowing. He continues, “We
have departed from conventional logic by assuming that the construct is just as pertinent to some of the things which are seen as different as it is to the things which are
seen as alike.” Here, the fact that an object is not categorized as something can be an
important factor in a person’s personal construction of reality. Constructs are bipolar,
admitting knowledge of the sign/concept and its opposite rather than simple negation.
The SDFN extends this bipolar methodology of construction construction by asking people
to categorize elements as data, information, or knowledge. By articulating a tripolar
construction, we not only can articulate the positive categorizations of data, but can
more closely examine data as it transforms into specifically delineated categories.
Much would be lost if participants were asked to categorize “data or not data” as
the “not data” construction comprises everything that is not data, and is therefore not
particularly interesting as a means of indicating the ontological and epistemological
affordances of data. By requiring positive categorization, relationships between data and
other concepts can be elicited more easily than simple negation would warrant. However,
I also recognize that a a given categorization may simply be irrelevant in respect to data
(relevancy is a far more useful and pragmatic benchmark than negation). Kelly notes that
personal constructions are bounded [12], and are not necessarily “convenient” methods
of categorization. In that light, the interview methodology will allow participants to articulate other categories that do not belong to the trinary construct of data-informationknowledge.
The repGrid [13], is a similar concept elicitation method. Tan describes the IST uses
22
of the technique as: “a set of procedures for uncovering the personal constructs individuals use to structure and interpret events relating to the development, implementation,
use, and management of IST in organizations.” While it is more overtly focused on organizational modelling, and the interpretation of events, it is a study of cognitive processes in
an organizational setting to more effectively articulate information system requirements.
The repgrid relies on participants sorting a pre-established schema of entities or objects,
defined as a common set of “nouns or verbs” to constructs, the framing understanding
around those concepts. Tan describes repGrid concepts as: “Constructs represent the
research participant’s interpretations of the elements. Further understanding of these
interpretations may be gained by eliciting contrasts resulting in bi-polar labels. Using
the same example, research participants may come up with bi-polar constructs such
as “high user involvement – low user involvement to differentiate the elements (i.e., IS
projects).” The creation of framing dichotomies echos the construct framework of Kelly
and then allows users to sort elements within those constructs with a variety of different
methods.
However, the repgrid is not the best tool for understanding constructions of data:
while it does articulate a dichotomy, it fails to expose the manipulations attached to
data. Elicitation of affordances and transformations of data is crucial to understanding a
person’s construction of data in sufficient detail to provide useful tools designed for them.
Furthermore, while the statistical reliability of the repGrid is appreciated, especially as
it can be subject to content analysis through simple frequency counting, the lack of an
explicit period of participants to articulate their self-schemata robs interviewers of the
potential insights of an articulated schema.
A representation grid draws on the personal construct framework for its own
purposes of organizational knowledge modelling. In many ways, a “RepGrid” is a means
23
of evaluating a social construction of reality, as discussed by Berger and Luckman [20].
The social construction of reality echos the idea of personal constructions (though never
explicitly calls out the term) by evoking the different realities of objects, “Different
objects present themselves to consciousness as constituents of different spheres of reality.
I recognize the fellowmen I must deal with in the course of everyday life as pertaining to
a reality quite different from the disembodied figures that appear in my dreams. The two
sets of objects introduce quite different tensions into my consciousness and I am attentive
to them in quite different ways.” This evocation of personal constructions framing the
affordances of interaction was one of the other inspirations behind this project. While
Berger & Luckman articulate the primacy of our shared reality, this investigation explores
one area where that shared understanding may break down.
Shared understandings of reality as encoded as self-schemata and expressed as
understandings of terms. While this practice should just as easily be expressed as a
linguistic pursuit, the aim of this investigation is to uncover elements of that primal construction of reality, not in differences in linguistic expression of that construction. I have
found that the best way to explore an individual’s construction of reality is to ask them
to express that reality in database design. The act of rendering the real-in-mind into
diagrams expressing that causes an awareness of the self-schemata to coalesce simply by
bring it into the forefront of consciousness. Through introspection into cognitive activity,
self-schemata are formed: “attempts to organize, summarize, or explain one’s own behavior in a particular domain will result in the formation of cognitive structures about the
self or what might be called self-schemata. Self-schemata are cognitive generalizations
about the self, derived from past experience, that organize and guide the processing
of self-related information contained in the individual’s social experiences.” [21] It is
this very process which the creation of the data flow diagram occasions in regards to an
24
individual’s data manipulation activities. Furthermore, it is this act of schemata creation
and subsequent discussion that I aim to elicit with the SDFN.
The idea of schemata qua personal constructions of reality influencing human
computer interfaces and system design is not novel. Though in the HCI field, the term
“mental model” is used. Wilson and Rutherford were exploring this very topic in 1989.
Specifically, while they identify a significant variation in the definitions of the term
“mental model,” they generalize the term to: “a representation formed by a user of a
system and/or task, based on previous experience as well as current observation, which
provides most (if not all) of their subsequent system understanding and consequently
dictates the level of task performance.” [22] The definitions they synthesize this from
extend back into the seventies, and there is no fundamental disagreement that the
practice of human-computer interaction is, in some way, the practice of presenting an
interface to these mental models.
It is important to note that there are philosophical distinctions between the terms
mental model, personal construction, and self-schemata. A personal construction is, in
many ways, the philosophical reality of a term. The construction provides for understanding of when and how to use the term for all use cases as well as its personal and
cultural semiotic identifications. A self-schemata is the articulated and explicit epistemological conceptions of the term: it is the developed understanding of an individual
understanding how they categorize and use a term. A mental model, on the other hand,
is the situated understanding in procedural memory. These mental model are, themselves, socially constructed through routines in organizations [23]. The mental model
is the procedural manifestation of he personal construction in the recognized semiotic
affordances of the concept of data.
Extending the mental model to expected manipulations of data, rather than ex-
25
pected interactions with a system is the providence of the DFD, though the DFD holds to
an objective reality which synthesizes many mental models. The SDFN, therefore, is a
way to inspect the subjective mental models of humans as they relate to the expected
interactions and transformations that their world applies to the thing they call data. As
the term is never formally taught, we must evolve our models by experience with the
world. Rasmussen asserts that mental models evolve with world-experience: “A mental
model of a physical environment is a causal model structured in terms of objects with
familiar functional properties. The objects interact in events, i.e., by state changes that
propagate through the system “Kelly argues that individuals use their own personal
constructs to understand and interpret events that occur around them and that these
constructs are tempered by the individual’s experiences.” [24]
As our experience with the world differs, so to must our models diverge to make
individual predictions about the systems we encounter in our subjective, constructed,
reality. Through articulated schema creation, we can expose a person’s mental map in a
sufficiently valid framework for database designers and philosophers to puzzle over.
5.1 Hierarchies of Data
This work is not the first to ponder the nature of data. There exist two significant and preestablished relationships of data to information and knowledge: Ackoff’s and Tuomi’s.
My findings mostly tend to echo the realities of data described by Ackoff or Tuomi.
While not every interview or survey articulates a hierarchical relationship between data,
information, and knowledge, it is clear that Ackoff’s work has entered the “common
knowledge”. A number of interviewees discussed a hierarchy of data first promulgated