-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhw4p4.txt
27 lines (22 loc) · 1.07 KB
/
hw4p4.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
CondFreqDist frequency distribution
Bigram Count Frequency
and she 79 0.02758380
and Mr 37 0.01291899
and herself 10 0.00349162
and sister 3 0.00104749
and lady 1 0.00034916
and manner 3 0.00104749
and cried 1 0.00034916
and feelings 1 0.00034916
and pride 4 0.00139665
and great 0 0.00000000
The COUNTS of the bigram frequency and conditional frequency distributions are
naturally the same, because the COUNT simply represents the number of times the
bigram appears in the text - which doesn't change.
The frequencies reported by the conditional frequency distribution are higher
than those reported by the bigram frequency because they are relative to the
first word in the bigram. I.e., in assuming that the first word is given to us,
we are looking at the probability of the existence of any of the words that could
follow it, which must add up to 1. Were we to multiply this probability with the
probability of existence of the first word, we get the same exact probabilities
for each bigram reported by the bigram frequency distribution.