(
Throughout the text for the Visit, click on the phrases in this colour
to see the relative FIGURES )
STUDY ON SAMPLES OF SONNETS
SETTING THE SEARCH
This search hopes to give a contribution
to the longstanding problem of whether Dante
wrote "Il Fiore" or not (a poem consisting of 232 sonnets).
If you look up "Il Fiore" in the "Enciclopedia dantesca  Ed. Treccani"
(Dantean Encyclopedia), you will get an idea of the current views
on the subject proposed by different groups of equally valid critics.
Without any decisive historical proof, several hypotheses have been
put forward, all still widely debated.
Considering this situation, previous experience in the field of Statistics
led us to think that conditions were extremely
favourable for a statistical study on the subject.
The original idea was to take three samples,
starting with the Rhymes in the "DANTE 2000" Archive, the first
consisting of sonnets by Dante, the second
sonnets from "Il Fiore" and the third
sonnets we know Dante did not write. These three samples could then
be studied at a grammatical level to be decided later.
But this original idea was subsequently changed
when a brand new Chapter was added to "DANTE 2000", dedicated to Coeval
Authors. With such a large number of sonnets available (over
850) by so many different Authors, it
became immediately apparent that our research should be conducted
differently and that we should exploit the enormous
amount of information we had available. This would help
lower the grammatical level of the study
which, contrary to what may be expected, exhalts
the power of the statistical test.
With this information at hand, we thought that the best variable to
study was "absolute frequency of characters".
At first we considered all the characters that occur. But subsequent
elaboration revealed that some of the orthographic elements interferred
with the test we wanted to use (the Chisquared
test). So under these circumstances, together with doubts whether
some of the punctuation marks may have been introduced or changed
by the copyists, we decided to limit the study to the letters
of the alphabet and the apostrophe
(a total of 22 characters in the Italian
alphabet).
For the sonnet sample size used
for our study, where possible we decided to use a sample size of 55
sonnets. We chose this number because it is the number of sonnets
that Dante wrote and at the same time guarantees a good level of significance
between samples. This choice allowed us to have 15
samples, given in the table (from sample A
to sample Q ), on the page in
Figure 25.
Sample R, relating to 6 modern
authors, serves for further study, which we shall explain
shortly.
See Figure
25  Videopage showing the conclusions of the Chisquare test applied
to sonnet samples A and C.
In particular, take a special look at B,
containing works by 5 different Authors,
each with 11 sonnets; this type of sample was compiled in the
hope of contributing more information for the general pattern of the
results.
By considering, where possible, more than one sample by the same Author,
we can obviously compare pairs of these samples with pairs of samples
from different Authors. This sort of comparison refers to the
first hypothesis to be tested, i.e. whether "significant
differences" exist between pairs by different Authors.
We also wanted to exploit the enormous amount of data available and
consider yet another hypothesis, even
though it may be less probable. This second hypothesis holds that
differences observed in the use of characters over time mark a kind
of "evolution" in writings by
the same author. For this purpose, when we had more
than one sample per Author (the Author of "Il Fiore", Guittone
d'Arezzo, Cino da Pistoia and Cecco Angiolieri) we compared samples
that were "spread" over time.
For example, in the case of "Il Fiore", the first
sample (C) comprises sonnets 1,
4, 7, etc.; the second
(D) sonnets 2, 5, 8, etc.;
the third (E)
sonnets 3, 6, 9, etc. Assuming that the order of the sonnets is based
on their date of composition, then samples C,
D and E
are independent of time; on the contrary,
our fourth sample (F)
from "Il Fiore", does not have
the same characteristics  it is not independent of time since it
consists of the last 55 sonnets in the collection.
We used a similar approach for the works of Guittone d'Arezzo. Obviously
we could not take samples that were "spread" over time for either
Cino da Pistoia or Cecco Angiolieri, for whom we have only
two samples.
Considering all the possible combinations, the 15
samples of sonnets meant there were 105
comparisons, one for each possible pair.
The statistical test used for this sort of comparison is the Chisquared,
which reveals differences between samples and evaluates the significance
level of the their differences. The absolute frequencies observed
(rough figures) were normalised with respect to the mean of the number
of characters per sample, calculated on all samples. The Table
of Results is given on the page in Figure 26.
See Figure
26  Videopage showing the Results for the sonnet samples.
Comparison between samples from modern
and "medieval" Authors
This test was performed after we had already obtained the results
for the problem of who wrote "Il Fiore", and which demonstrate the
surprising efficiency of the statistical test (Christened the PATERTEST).
Indeed, the results we obtained suggested the suitability of the test,
i.e. comparing the samples already available with samples taken from
the contemporary Italian language.
And so we prepared sample (R)
that consists of 55 sonnets, 10 of which
are by Praga and 9 for each of the following
Authors : Camerana, Carducci, Pascoli, Gozzano
and Corazzini. The following gives an idea of the importance
of the results of this latest study.
DISCUSSION OF RESULTS
We should now like to make a few remarks about the data shown in the
Table for "Overall view of results"
of Figure
26. These data correspond to the values of the Chisquared
test applied to all the 105 possible pairs
from the 15 samples (from A to
Q).
Data relative to sample R of modern
Authors are given in the last column (grey), at the bottom of the
Table.
First let's explain what the values of the Chisquared
mean in the table. The value of this index
of association in a given case, i.e. referred to a particular
pair of samples, furnishes a measure of the
significance of the "differences" observed between two
samples. The higher a value is
, the lower is the probability of the error
(based on the PizzettiPearson relation) in accepting the hypothesis
that the samples we tested are "different"  or, in other words, if
we affirm that the "differences" observed
are not merely due to chance.
It is possible to obtain the probability of such an event by looking
up the value for the Chisquared in the appropriate
Table. It would be too complicated to show the Table
in the present context, so we refer the reader to "DANTE 2000", where
you can consult the table by clicking on the "Chisquare
Table" command on the videopage in Figure
25 and Figure
26. The following two examples show how to use the Table.
As our first example, let's take the value of 17.45,
i.e. the result obtained from comparing samples C
and D (both
of which come from "ll Fiore" and are "spread"
out over time) ; let's go to line
21 in the Table, as 21 is our degree
of freedom :
[ number of samples (2)
 1) x (number of characters (22)  1) ] ;
at this point we find that our value of 17.45 falls between 13.240
and 20.337 in the Table, corresponding to probability levels of
0.900 and 0.500 respectively. This means that the probability that
the differences between the two samples are merely due to chance
is between 0.900 (90%) and 0.500 (50%).
As our second example, let's take the value of 72.31,
the result we get by comparing samples C
and F. The first is "spread"
over time but the second is not. Again let's look at
line 21; this time we can see that our value (72.31) is greater
than 49.011 in the Table, the value that corresponds to a probability
level of 0.0005. This means we can say that the probability that
the differences between the two samples is merely due to chance
is lower than 0.0005 (0.05%).
To sum things up, the theory states that the difference between
samples can be considered as "statistically
significant" if the probability that it is due to chance
is lower than "5%. For the samples
in our examples we can therefore:
 Reject the hypothesis that there
is a difference between samples C
and D ;
 Accept the hypothesis
that there is a difference
between samples C and F.
In our case, we can also say that the test revealed a significant
"difference" between two samples by the same Author, i.e. between
a first sample "spread" out over time and a second which was not.
We can reach a similar conclusion for all the other values obtained
with the Chisquared test that compared samples from "Il
Fiore" (see Figure
26 the first zone highlighted in yellow).
Again the same considerations apply to samples G,
H, I
and L, relative to Guittone
d'Arezzo (second zone highlighted in yellow).
So far we have only talked about the statistical significance of
some of the results and intentionally avoided discussing their interpretation.
Let's start to do so now, remembering our two
hypotheses, the verification of which lies at the base of
our study: the main hypothesis is that
the test can "feel" whether individual samples belong to their respective
Authors, and the second hypothesis
(much more optimistic), concerns the possibility that the test can
also "feel" how the writing of an Author "evolves" over time.
We are convinced that none of the comparisons we have made can decisively
influence our work hypothesis one way or another alone, even though
the results are highly significant. But we are just as convinced
that by taking the results as a whole,
we can extract a considerable amount of precious clues in the desired
direction.
If we take the second working hypothesis
concerning the hypothetical evolution in the writings of the same
author over time, then all the
comparisons made with regard to this phenomenon, the results of
which are highlighted in yellow,
support the hypothesis and give high confidence levels. The test
clearly "feels" the "differences" between samples by the same Author
that are spread over time and those that are not.Our method of building
up these samples guarantees that these "differences" are due to
the eventual dependence on the time of the sample itself.
Having confirmed our second work hypothesis (the more ambitious),we
could well expect confirmation of our first
hypothesis, i.e. that the test could distinguish between
samples by the same or different Authors. Careful analysis of the
results as a whole leads to the conclusion that in
their entirety and with even higher confidence levels
than in the previous case, they do indeed support the first working
hypothesis.
We can conclude by observing that the coherence
in the pattern of results, of an "eloquence" rarely encountered
in previous studies, can only be rejected if Dante actually did
write "Il Fiore" (see in Figure
26 the values highlighted in violet).
Comparison between a sample by modern
Authors and the samples by medieval Authors
The sample of Modern Authors (Praga, Camerana, Carducci, Pascoli,
Gozzano and Corazzini) allows 15 comparisons
with the other samples and therefore furnishes 15 values from the
Chisquared test, given in full in "DANTE 2000". Just a quick glance
at these 15 values immediately gives an
idea of the extraordinary power of the test; in this
case too (after 700 years of evolution of the language), the results
are significant. Although all the values for the Chisquared test
are high (which was to be expected), what is so amazing is the far
lower value obtained from the comparison between Dante and Modern
Authors.
In our opinion, this result shows that more
indepth studies are necessary, and that the study should
be extended to samples in prose. In any case, from now on, but not
in this context, the results must be considered seriously as in
our opinion, they could infer implications of an anthropological
nature.
Note dated 20.1.2003 Indepth
research on the above point is now underway, in a different direction,
with regard to the origins of the Italian and other four European
languages. The first results are not only interesting, in some cases
they are quite surprising.

Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
Guided Tour page 10
