These functions, gathered together and still in beta-testing , starting from a source text or part of it
(howeever no lower than a verse), would search in the whole corpus of Musisque Deoque, or in one of its sections, verbal or also non-verbal rhythmic similarities.
After having chosen a source text and set up some main options, the search engine will draw to scholars' attention a certain number of results which could be significant,
picked out from a usually huge mass of irrilevant material.
The functions, although complementary in their goals, are distinguished in two different types, also managed in distinct pages of the application; in detail:
Scoring and selection criteria
Both the approches of the searching tool by co-occurrences (lexical; metrical and verbal) produce an overabundance of results, but scholars
should not be overwhelmed by it.
Following this assumption, our choices move from the certainty that, although the tool's discerning ability could be improved,
significant results will always be surrounded by ground noise; hence, we think that it is more profitable,
but perhaps less extraordinary, to aim not at the perfect tool, showing to scholars a nice and ready result,
but instead to supply them with a stock of filters and different ways of reading the results,
to help them, with the irreplaceable guidance of their sharpness and experience, to find a nugget between pebbles.
This means, in practice, that we don't force the authomatic selection of the machine to its limits, to avoid the risk of losing valuable results,
often hidden where we do not expect them to be. Obviously, we could not do without a vigorous initial selection of the results,
specialised to discern more at the lower levels than at the higher ones, that is, more aimed at removing the mediocre ones, to bring to surface the excellent ones.
Main criteria for the selection:
-
Lexical co-occurrences
The scoring system for the co-occurrence is based on: identity of forms, sequence of the two words,
comparison between words' distance in the source text and in the target, position into the verse.
For what concerns this last criterion, each word with the same significant position (first, second, penultimate, last)
in the two compared texts is given a score point; also if only one of the words has the same position, but the same distance,
it provides to the result two score points.
-
Metrical and verbal co-occurrences
-
search for words: only the occurrences with a matching of, at least, 4 syllables in one or more words are accepted;
then the occurrences are ordered by importance, on the basis of the number of corresponding syllables, of the proximity of the words found and
also of the equivalence of consonants;
-
search for sequences of syllables: only the occurrences with a certain number of identities in the consonantic part
of the syllables are accepted.
Lexical co-occurrences: search by lemmas
In the lexical co-occurrences' search, Search by lemmas option is set by default. It can be useful to point out that with this option employs the same
rules of the Musisque Deoque advanced search: the search for lemmas is not extended to all inflected forms led back to a lemma, but only to
those with the same number of syllables of the source form; on the contrary, this search can widen also to other lemmas, namely compounds with only different prefixes (for example, advenio and pervenio).
Verbal and metrical co-occurrences: matched search
One could think that the search for sequences of syllables, working in a less detailed manner of that for words, could include also the results of the last one.
But this is not true, and it isn't difficult to explain why:
-
the search for words can intercept non-consecutive words, each below the finding threshold of a syllabic sequence (4 or 5),
but comprehensively giving a result accepted by that kind of search (> 3 syllables); for example, two words of two non-consecutive syllables,
not found with the second approch, also working for sequences of 4 (it should work for sequences of 2, but the number of the results would then be unmanageable);
-
the scoring criteria of the two methods are necessarily quite different and the second one can exclude results accepted by the first one.
Therefore, we have to think at these two approches not as interchangeable or in alternative, but instead as complementary,
because, although the results are in part overlapping, each one provides some significant results in an exclusive manner.
This is the reason why it is given the chance to combine them with a unique call, which sums up the two series of results, removing redundancies.