System on the complete works

Alberto Acquaro's studio

Guided tour of the system

    page 10 of 11    
( Throughout the text for the Visit, click on the phrases in this colour to see the relative FIGURES )



This search hopes to give a contribution to the long-standing problem of whether Dante wrote "Il Fiore" or not (a poem consisting of 232 sonnets). If you look up "Il Fiore" in the "Enciclopedia dantesca - Ed. Treccani" (Dantean Encyclopedia), you will get an idea of the current views on the subject proposed by different groups of equally valid critics. Without any decisive historical proof, several hypotheses have been put forward, all still widely debated.

Considering this situation, previous experience in the field of Statistics led us to think that conditions were extremely favourable for a statistical study on the subject.

The original idea was to take three samples, starting with the Rhymes in the "DANTE 2000" Archive, the first consisting of sonnets by Dante, the second sonnets from "Il Fiore" and the third sonnets we know Dante did not write. These three samples could then be studied at a grammatical level to be decided later.

But this original idea was subsequently changed when a brand new Chapter was added to "DANTE 2000", dedicated to Coeval Authors. With such a large number of sonnets available (over 850) by so many different Authors, it became immediately apparent that our research should be conducted differently and that we should exploit the enormous amount of information we had available. This would help lower the grammatical level of the study which, contrary to what may be expected, exhalts the power of the statistical test.
With this information at hand, we thought that the best variable to study was "absolute frequency of characters". At first we considered all the characters that occur. But subsequent elaboration revealed that some of the orthographic elements interferred with the test we wanted to use (the Chi-squared test). So under these circumstances, together with doubts whether some of the punctuation marks may have been introduced or changed by the copyists, we decided to limit the study to the letters of the alphabet and the apostrophe (a total of 22 characters in the Italian alphabet).

For the sonnet sample size used for our study, where possible we decided to use a sample size of 55 sonnets. We chose this number because it is the number of sonnets that Dante wrote and at the same time guarantees a good level of significance between samples. This choice allowed us to have 15 samples, given in the table (from sample A to sample Q ), on the page in Figure 25.
Sample R, relating to 6 modern authors, serves for further study, which we shall explain shortly.

See Figure 25 - Video-page showing the conclusions of the Chi-square test applied to sonnet samples A and C.

In particular, take a special look at B, containing works by 5 different Authors, each with 11 sonnets; this type of sample was compiled in the hope of contributing more information for the general pattern of the results.

By considering, where possible, more than one sample by the same Author, we can obviously compare pairs of these samples with pairs of samples from different Authors. This sort of comparison refers to the first hypothesis to be tested, i.e. whether "significant differences" exist between pairs by different Authors.

We also wanted to exploit the enormous amount of data available and consider yet another hypothesis, even though it may be less probable. This second hypothesis holds that differences observed in the use of characters over time mark a kind of "evolution" in writings by the same author. For this purpose, when we had more than one sample per Author (the Author of "Il Fiore", Guittone d'Arezzo, Cino da Pistoia and Cecco Angiolieri) we compared samples that were "spread" over time.
For example, in the case of "Il Fiore", the first sample (C) comprises sonnets 1, 4, 7, etc.; the second (D) sonnets 2, 5, 8, etc.; the third (E) sonnets 3, 6, 9, etc. Assuming that the order of the sonnets is based on their date of composition, then samples C, D and E are independent of time; on the contrary, our fourth sample (F) from "Il Fiore", does not have the same characteristics - it is not independent of time since it consists of the last 55 sonnets in the collection.
We used a similar approach for the works of Guittone d'Arezzo. Obviously we could not take samples that were "spread" over time for either Cino da Pistoia or Cecco Angiolieri, for whom we have only two samples.

Considering all the possible combinations, the 15 samples of sonnets meant there were 105 comparisons, one for each possible pair.
The statistical test used for this sort of comparison is the Chi-squared, which reveals differences between samples and evaluates the significance level of the their differences. The absolute frequencies observed (rough figures) were normalised with respect to the mean of the number of characters per sample, calculated on all samples. The Table of Results is given on the page in Figure 26.

See Figure 26 - Video-page showing the Results for the sonnet samples.

Comparison between samples from modern and "medieval" Authors

This test was performed after we had already obtained the results for the problem of who wrote "Il Fiore", and which demonstrate the surprising efficiency of the statistical test (Christened the PATERTEST). Indeed, the results we obtained suggested the suitability of the test, i.e. comparing the samples already available with samples taken from the contemporary Italian language. And so we prepared sample (R) that consists of 55 sonnets, 10 of which are by Praga and 9 for each of the following Authors : Camerana, Carducci, Pascoli, Gozzano and Corazzini. The following gives an idea of the importance of the results of this latest study.


We should now like to make a few remarks about the data shown in the Table for "Over-all view of results" of Figure 26. These data correspond to the values of the Chi-squared test applied to all the 105 possible pairs from the 15 samples (from A to Q).
Data relative to sample R of modern Authors are given in the last column (grey), at the bottom of the Table.

First let's explain what the values of the Chi-squared mean in the table. The value of this index of association in a given case, i.e. referred to a particular pair of samples, furnishes a measure of the significance of the "differences" observed between two samples. The higher a value is , the lower is the probability of the error (based on the Pizzetti-Pearson relation) in accepting the hypothesis that the samples we tested are "different" - or, in other words, if we affirm that the "differences" observed are not merely due to chance.

It is possible to obtain the probability of such an event by looking up the value for the Chi-squared in the appropriate Table. It would be too complicated to show the Table in the present context, so we refer the reader to "DANTE 2000", where you can consult the table by clicking on the "Chi-square Table" command on the video-page in Figure 25 and Figure 26. The following two examples show how to use the Table.
  • As our first example, let's take the value of 17.45, i.e. the result obtained from comparing samples C and D (both of which come from "ll Fiore" and are "spread" out over time) ; let's go to line 21 in the Table, as 21 is our degree of freedom :

            [ number of samples (2) - 1) x (number of characters (22) - 1) ] ;

    at this point we find that our value of 17.45 falls between 13.240 and 20.337 in the Table, corresponding to probability levels of 0.900 and 0.500 respectively. This means that the probability that the differences between the two samples are merely due to chance is between 0.900 (90%) and 0.500 (50%).
  • As our second example, let's take the value of 72.31, the result we get by comparing samples C and F. The first is "spread" over time but the second is not. Again let's look at line 21; this time we can see that our value (72.31) is greater than 49.011 in the Table, the value that corresponds to a probability level of 0.0005. This means we can say that the probability that the differences between the two samples is merely due to chance is lower than 0.0005 (0.05%).

    To sum things up, the theory states that the difference between samples can be considered as "statistically significant" if the probability that it is due to chance is lower than "5%. For the samples in our examples we can therefore:
    • Reject the hypothesis that there is a difference between samples C and D ;
    • Accept the hypothesis that there is a difference between samples C and F.
    In our case, we can also say that the test revealed a significant "difference" between two samples by the same Author, i.e. between a first sample "spread" out over time and a second which was not.
    We can reach a similar conclusion for all the other values obtained with the Chi-squared test that compared samples from "Il Fiore" (see Figure 26 the first zone highlighted in yellow).
    Again the same considerations apply to samples G, H, I and L, relative to Guittone d'Arezzo (second zone highlighted in yellow).

    So far we have only talked about the statistical significance of some of the results and intentionally avoided discussing their interpretation. Let's start to do so now, remembering our two hypotheses, the verification of which lies at the base of our study: the main hypothesis is that the test can "feel" whether individual samples belong to their respective Authors, and the second hypothesis (much more optimistic), concerns the possibility that the test can also "feel" how the writing of an Author "evolves" over time.
    We are convinced that none of the comparisons we have made can decisively influence our work hypothesis one way or another alone, even though the results are highly significant. But we are just as convinced that by taking the results as a whole, we can extract a considerable amount of precious clues in the desired direction.

    If we take the second working hypothesis concerning the hypothetical evolution in the writings of the same author over time, then all the comparisons made with regard to this phenomenon, the results of which are highlighted in yellow, support the hypothesis and give high confidence levels. The test clearly "feels" the "differences" between samples by the same Author that are spread over time and those that are not.Our method of building up these samples guarantees that these "differences" are due to the eventual dependence on the time of the sample itself.

    Having confirmed our second work hypothesis (the more ambitious),we could well expect confirmation of our first hypothesis, i.e. that the test could distinguish between samples by the same or different Authors. Careful analysis of the results as a whole leads to the conclusion that in their entirety and with even higher confidence levels than in the previous case, they do indeed support the first working hypothesis.

    We can conclude by observing that the coherence in the pattern of results, of an "eloquence" rarely encountered in previous studies, can only be rejected if Dante actually did write "Il Fiore" (see in Figure 26 the values highlighted in violet).

    Comparison between a sample by modern Authors and the samples by medieval Authors

    The sample of Modern Authors (Praga, Camerana, Carducci, Pascoli, Gozzano and Corazzini) allows 15 comparisons with the other samples and therefore furnishes 15 values from the Chi-squared test, given in full in "DANTE 2000". Just a quick glance at these 15 values immediately gives an idea of the extraordinary power of the test; in this case too (after 700 years of evolution of the language), the results are significant. Although all the values for the Chi-squared test are high (which was to be expected), what is so amazing is the far lower value obtained from the comparison between Dante and Modern Authors.
    In our opinion, this result shows that more in-depth studies are necessary, and that the study should be extended to samples in prose. In any case, from now on, but not in this context, the results must be considered seriously as in our opinion, they could infer implications of an anthropological nature.

    Note dated 20.1.2003 In-depth research on the above point is now underway, in a different direction, with regard to the origins of the Italian and other four European languages. The first results are not only interesting, in some cases they are quite surprising.


      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

      Guided  Tour      page 10

        page 10 of 11    

    "DANTE 2000" - Alberto Acquaro's studio -  [ Map ]

    Web-site by Filarete S.r.l.