Voynich Bombe

August 26, 2017

As of late, I notice a slight tendency to get bored with the “ugly duckling” alone, and also some annoyance with the current state of affairs of “Voynichism”.

As much as I value the VMS as a hub for further research, the more I get drawn to other topics.

A while ago I found out for myself that the order of things needs to be reverted if one aspires to do serious research as an amateur. One cannot simply pick a feature, then compare everything else to it. No, it is hard work: One needs to look at everything else first, and then compare the feature of interest to your set of learnings.

Simple as that.

The world is, and always was full of interesting things. No one ever believed it was a disc.

“. . . the person who is used to inquiry tries every possible pathway as he conducts his search and turns in every direction, and, so far from giving up the inquiry in the space of a day, does not cease his search throughout his life: directing his attention to one thing after another that is relevant to what is being investigated, he presses on until he attains his goal.”

Erasistratos of Ioulis, Paralysis book 21 12; trans. by G.E.R. Lloyd, Greek Science After Aristotle [1973] 86)]

Romani Recursion – Curiosities and Trivia

“[…] However, Devanagari came into being after the Romani left India.” (from Wikipedia).

There seems not a lot to be added to the recurring popular topic of a possible Brahman / Hindu origin transported via Rom & Dom migration in VMs research, however the connection resembles a promising starting point for further exploration, as so many other things related to VMs research.

A more recent attempt to construct a Romani alphabet based on Devanagari was not successful. The essay “Romani Orthographies”1 outlines some of the difficulties contemporary standardization attempts were & still are faced with. The necessity of concepts like archegraphemes or morpho-graphs does not make the task of constructing a writing system appear particularly simple.

There is a ca. 1515 account of written Romani, as outlined in a 2011 german article on hypotheses2, which in turn refers to Knauer 20103. The abstract of the latter:

“Thirteen unpublished lines of a Latin–Romani vocabulary in a manuscript in Munich represent the earliest document recording efforts to put words of Romani, ‘primarily an oral language’ (Matras 2002: 238), into writing. The Benedictine compiler and scribe of the list was familiar with important contemporary German scholars, a fact that may enhance the authenticity of his numerous excerpts and explain the almost scholarly approach to such exotic languages as Romani. It can be assumed that the ‘interviewer’ got his information in Vienna around 1515, preceding the often adduced vocabularies of Borde (1542), Ewsum (before 1570), Vulcanius (1597), Çelebi (1668), and Marsden (1785). He organized the results of his questioning neatly in groups, heavenly bodies, humans and animals, food, and cardinal numerals.”

Why Johannes von Grafing’s account in Opuscula Cod.graec. 582 a4 lists only seven of the zodiac signs remains curious.

There may be a few interesting points to note, firstly the singularity of the event. As far as currently observable the work seems not to have been influential in any way, thus underlining the possibility of similar finds.
The aforementioned almost scholarly approach seems to be, at least to some amount attributable to education, hinting at “schools of thought”.
Johannes von Grafing’s use of diacritic ü seems noteworthy as well. Interestingly, the proposed use of diacritics has been rejected by participants in a 1990’s university survey of spelling preference by speakers.5

If we allow for a brief moment of speculation, let us imagine a learned6 scribe from around XV. with a similar, yet slightly more ambitious project: the standardization and compilation of ancient Romani tradition from oral and (hypothetically existing) older written sources in Aramaic, Greek, Arabic, Glagolitic (or more exotic) scripts, writing all sorts of Para-Romani languages and different Romani dialects from different ages. It seems kind of a burden to take on for our scribe, and the book may very well turn out a “babel manuscript”.

A group of Romani people also called Erromintxela has a particularly interesting story: formerly referred to as “[…] ijitoak ‘Egyptians’, ungrianok ‘Hungarians’, or buhameak ‘Bohemians'”, arrived in Basque lands in XV., integrated with Basque society, and developed a Para-Romani language using Basque grammar.
“The initial E- is the Basque prosthetic vowel, added because no Basque word may begin with an R-, and the final -a is the absolutive case suffix, used when citing a name. If this etymology is correct, it is a rare case of a native Romani name for themselves (an endonym) being borrowed by another language.”
A form that jumps to eye is the verb “ajin / najin” for “to have / not to have”.7

Trivia: In 2012 german gangster rapper “Haftbefehl” (warrant) charted a hit with “Chabos wissen wer der Babo ist”. The track was praised and criticised for its language in mainstream media, at the same time.
When finding “puer schabo” in the “Collectanæa” it clicked for the author that the rapper might have employed the Romani term “chavô” for kid/lad/friend, and indeed a quick search turned up that Haftbefehl also used “Manische Sprache”,8 a Romani sociolect of the Frankfurt a. Main area.
“Babo” subsequently was voted for as “german youth word of the year” in 2013. The etymological roots of the word point back to Zaza-Language, the rest of the lyrics is composed of a polyglottal mix of German, English, French, Turkish, Kurdish, Arabian and Serbian phrases.


Towards a Unicode Transliteration Table for VMScript

If we look at Capelli, Bischoff, etc. long enough it does not seem an unusual notion at all that  VMS symbols are derived from latin letter shapes (I am trying to avoid the term “glyph” as it often gets misinterpreted1). If we read: “The script uses many ligatures and has many unique scribal abbreviations, along with many borrowings from Tironian notes2“, it is about the shapes of Insular Script, not VMScript. It seems virtually impossible to invent letter shapes out of the blue, without resemblance to anything known3.

If we did the statistics and identified most of them, we could also take a look at Unicode charts, especially Latin Extended A through D, Latin Supplements, the MUFI recommendations etc. and try to locate the glyphs and their corresponding character code points. Almost everything is there. Medievalists need a lot of glyphs to encode manuscripts, like “LATIN SMALL LETTER A INSULAR FORM , LATIN SMALL LETTER OPEN A CAROLINGIAN FORM , LATIN SMALL LETTER N WITH FLOURISH  (an old friend if constructed with minims), LATIN SMALL LETTER T ROTUNDA ꞇ” for the basics, or more advanced, “LATIN ABBREVIATION SIGN SMALL CON DESCENDING ꝯ, LATIN ABBREVIATION SIGN SMALL IS ꝭ, BREVE BELOW”, all sorts of combining diacritics, contextual spacing modifiers, and last but not least, 6 different spaces.

What I’m hinting at:
A graphemic transliteration table could be constructed. We still do not care about meaning, we are simply looking for allographs, “alike looking glyphs”. There is no need to settle on a singular verdict for a glyph, on the contrary, we are noting down variants. This will be very helpful later on, as well as describing the glyphs verbosely.

While the MUFI recommendation contains a lot of latin ligatures as code points, Unicode discourages the addition of new ligatures.
Contemporary font standards, mostly OpenType and SIL Graphite allow for the composition of ligatures as part of smart font features. So we would try to express as many of the more complex VMS signs as contextual ligatures, eventually making use of complex text layout. There are a lot of possibilities. It may be up to judgement in some cases. But of course this means we are also in need a font supporting this.

We have constructed a sieve, and what rests within are unique, unknown glyphs. Did we wish to encode VMScript, these would go to a PUA, a private use area of Unicode, preferably taking unpopulated code points.

Why is this of significance?
A Unicode transliteration table would in turn allow us to create a scholarly acceptable, palæographic (also: allographic) transcription of the VMS. There is still no diplomacy, no expansion of abbreviations, no judgment on meaning, we simply record what we see.
The difference is, that the recorded glyphs do not map to ASCII “a”, “Z”, “#”, “/”, etc., like EVA does, but to their respective code points. This should help to avoid the misunderstandings EVA encourages, and allow scholars & information scientists alike to work with a reliable transcription. Wonderful things could be done, like judging allographic variation by mean distribution, e.g. “-is” vs. “-ris” (good that we noted variants before).

Of course a lot depends on the transcription itself, and there are numerous options how to tackle this. Accepted scholarly standards exist and should largely be followed or extended upon. Open Source software for collaborative work exists and needs to be evaluated. This shall be elaborated on in a follow-up post.

Fun with ſ

Who says Wikipedia can’t be fun? Linguists can be rather nerdy, following some excerpts from the Talk page for the long s article:

Where the bee ſucks, there ſuck I;
In a cowſlip’s bell I lie;
There I couch when owls do cry.
On the bat’s back I do fly
After ſummer merrily.
–William Shakeſpeare

Long s had a dethender to begin with, and during the medieval era thcribeth gradually thortened the dethender part until long s had no dethender to thpeak of. Descenderless long esess are an example of atrophied letter forms. Oh gee, now I’ve completely broken the pattern.

May the ſ character bloſsom like the roſe

And, may the phyſical ſpacing of it one day improve.

Ðe letter ſ is ſweet! We ſhould use it more. Alſo, braſsſmith is correct. I þink it alſo makes the word “ſcrewed” look muć better. Wiþ a compoſe key on Linux, you can uſe Compoſ, f, s to get it.

Ðiſ iſ getting ſilly.
Gettiŋ ſilly, you ſay? I ſay we revive ſome of ðeſe ‘antique’ letters, and (of courſe) briŋ ðe “eszett (“ß”)” in from German. To me it’s quite a preßiŋ ißue. I þink we need more letters!

But, more serious:

It is important to note that many languages, eſpecially Germanic ones, make words by compounding ſhorter words and word fragments. When word a made is made from a part ending in s followed by another part, the compound word ſhould ſtill be written with a final s even though it is now inſıde a word. The correct uſe of long verſus ſhort s can make the ſtructure clearer, and ſometimes remove ambiguity.

Ash Wednesday

My apartment building has the nice installation of a giveaway place near the entrance. I picked up some remarkably interesting “abandoned” books from there, in the past.

Today I reaped “Sterben und Tod im Mittelalter von Norbert Ohler” (Dying and Death in the mediaeval ages). A rather fitting find for the occasion of the day, and how much I am looking forward to browse through it for a little read before slumber time.. Fastnacht is over.

Seriously, it is a well written, comprehensive book on all aspects of the topic, including lots of manuscript text references and imagery.

Voynich Heavy Inking

This was supposed to be a comment on Nick Pelling’s blog ciphermysteries, but turned out too lengthy. So I decided to turn it into a blog post. It is of the “all out” type..


I am wondering about the use of the term “glyph” in voynich world. I suspect it is understood slightly different as in Unicode terminology, i.e. “characters vs. glyphs“. More like in “hiero-glyphs”, I’m afraid.

I think EVA’s “bad” is in that it partially tries to transport meaning in shapes instead of code points, and that it encodes in (E)ASCII instead of Unicode.

If we look at the Copiale Chiffre decipherment, the simple complication of “tokenising” (substituting) german bi- & trigraphs “ch”, “tz”, “sch”, and doubles “ll” and so on in separate homophone sets disrupts statistical language guessing and more.

Now EVA does exactly that when it introduces special characters which could be mapped otherwise in Unicode (like diacritics, dialectic variants, tironian notes, latin & scribal abbreviations, phonotactics [variable spacing] etc.) or be expressed as ligatures. Much worse, the mapping is to high ASCII chars instead of the Unicode Private Use Area. So e.g. we throw an inofficial &163 (£, english pound sign) at the stats where it would eventually need an U+2184, Latin Small Letter Reversed C, etc. The neglecting of ligatures, abbreviations and spacing does it’s deed.

While otherwise little regarded features of medieval crypto techniques of Linguistic Obfuscation, for example Code Switching (using latin, vernacular french, italian etc. intermittently) and layering of these “weak” methods certainly pose an obstacle, some statistical methods are not impeded by the uncertainty of existing transcriptions, like the fundamental note, that there are no capitalizations in, sorry for the pun, “so-called voynichese script”.

The voynichese character set is not an “in situ” creation, meaning it is not invented out of the blue, as this is an almost impossible task for reasons I cannot outline here.

The script uses many ligatures and has many unique scribal abbreviations, along with many borrowings from Tironian notes” would describe it rather well, while this quote is nicked from the Wikipedia article about Insular Script.

Prescribed practice hasn’t been tried, so far. The most difficult problem of “no language, no alphabet” could be digitally tackled with a graphemic transliteration table first, followed by an allographic analysis (comparing mean distribution of possible variants), encoding spatial positions on a character level, encoding emanation types (e.g. inking density for writing order definition), ambiguities etc, etc. Of course encoding the imagery, marginalia, physical properties etc. would be part of the VMS ontology, no matter if TEIP5 or standoff-property style.

Multi-level is the keyword, but I realise this is getting much too lengthy for now while not even beginning to outline the task completely. It means tons of work. I would like to avoid a certain proverb I find ghastly, but it is true:

A lot of bathtubs will have to be unplugged.

The whole duty
of Typography, as of
is to communicate
to the imagination,
without loss
by the way,
the thought or image
to be communicated
by the Author.

توانا بود هر که دانا بود

شاهنامه ابوالقاسم فردوسی