Towards a Unicode Transliteration Table for VMScript

If we look at Capelli, Bischoff, etc. long enough it does not seem an unusual notion at all that  VMS symbols are derived from latin letter shapes (I am trying to avoid the term “glyph” as it often gets misinterpreted1). If we read: “The script uses many ligatures and has many unique scribal abbreviations, along with many borrowings from Tironian notes2“, it is about the shapes of Insular Script, not VMScript. It seems virtually impossible to invent letter shapes out of the blue, without resemblance to anything known3.

If we did the statistics and identified most of them, we could also take a look at Unicode charts, especially Latin Extended A through D, Latin Supplements, the MUFI recommendations etc. and try to locate the glyphs and their corresponding character code points. Almost everything is there. Medievalists need a lot of glyphs to encode manuscripts, like “LATIN SMALL LETTER A INSULAR FORM , LATIN SMALL LETTER OPEN A CAROLINGIAN FORM , LATIN SMALL LETTER N WITH FLOURISH  (an old friend if constructed with minims), LATIN SMALL LETTER T ROTUNDA ꞇ” for the basics, or more advanced, “LATIN ABBREVIATION SIGN SMALL CON DESCENDING ꝯ, LATIN ABBREVIATION SIGN SMALL IS ꝭ, BREVE BELOW”, all sorts of combining diacritics, contextual spacing modifiers, and last but not least, 6 different spaces.

What I’m hinting at:
A graphemic transliteration table could be constructed. We still do not care about meaning, we are simply looking for allographs, “alike looking glyphs”. There is no need to settle on a singular verdict for a glyph, on the contrary, we are noting down variants. This will be very helpful later on, as well as describing the glyphs verbosely.

While the MUFI recommendation contains a lot of latin ligatures as code points, Unicode discourages the addition of new ligatures.
Contemporary font standards, mostly OpenType and SIL Graphite allow for the composition of ligatures as part of smart font features. So we would try to express as many of the more complex VMS signs as contextual ligatures, eventually making use of complex text layout. There are a lot of possibilities. It may be up to judgement in some cases. But of course this means we are also in need a font supporting this.

We have constructed a sieve, and what rests within are unique, unknown glyphs. Did we wish to encode VMScript, these would go to a PUA, a private use area of Unicode, preferably taking unpopulated code points.

Why is this of significance?
A Unicode transliteration table would in turn allow us to create a scholarly acceptable, palæographic (also: allographic) transcription of the VMS. There is still no diplomacy, no expansion of abbreviations, no judgment on meaning, we simply record what we see.
The difference is, that the recorded glyphs do not map to ASCII “a”, “Z”, “#”, “/”, etc., like EVA does, but to their respective code points. This should help to avoid the misunderstandings EVA encourages, and allow scholars & information scientists alike to work with a reliable transcription. Wonderful things could be done, like judging allographic variation by mean distribution, e.g. “-is” vs. “-ris” (good that we noted variants before).

Of course a lot depends on the transcription itself, and there are numerous options how to tackle this. Accepted scholarly standards exist and should largely be followed or extended upon. Open Source software for collaborative work exists and needs to be evaluated. This shall be elaborated on in a follow-up post.

Fun with ſ

Who says Wikipedia can’t be fun? Linguists can be rather nerdy, following some excerpts from the Talk page for the long s article:

Where the bee ſucks, there ſuck I;
In a cowſlip’s bell I lie;
There I couch when owls do cry.
On the bat’s back I do fly
After ſummer merrily.
–William Shakeſpeare

Long s had a dethender to begin with, and during the medieval era thcribeth gradually thortened the dethender part until long s had no dethender to thpeak of. Descenderless long esess are an example of atrophied letter forms. Oh gee, now I’ve completely broken the pattern.

May the ſ character bloſsom like the roſe

And, may the phyſical ſpacing of it one day improve.

Ðe letter ſ is ſweet! We ſhould use it more. Alſo, braſsſmith is correct. I þink it alſo makes the word “ſcrewed” look muć better. Wiþ a compoſe key on Linux, you can uſe Compoſ, f, s to get it.

Ðiſ iſ getting ſilly.
Gettiŋ ſilly, you ſay? I ſay we revive ſome of ðeſe ‘antique’ letters, and (of courſe) briŋ ðe “eszett (“ß”)” in from German. To me it’s quite a preßiŋ ißue. I þink we need more letters!

But, more serious:

It is important to note that many languages, eſpecially Germanic ones, make words by compounding ſhorter words and word fragments. When word a made is made from a part ending in s followed by another part, the compound word ſhould ſtill be written with a final s even though it is now inſıde a word. The correct uſe of long verſus ſhort s can make the ſtructure clearer, and ſometimes remove ambiguity.

Ash Wednesday

My apartment building has the nice installation of a giveaway place near the entrance. I picked up some remarkably interesting “abandoned” books from there, in the past.

Today I reaped “Sterben und Tod im Mittelalter von Norbert Ohler” (Dying and Death in the mediaeval ages). A rather fitting find for the occasion of the day, and how much I am looking forward to browse through it for a little read before slumber time.. Fastnacht is over.

Seriously, it is a well written, comprehensive book on all aspects of the topic, including lots of manuscript text references and imagery.

Voynich Heavy Inking

This was supposed to be a comment on Nick Pelling’s blog ciphermysteries, but turned out too lengthy. So I decided to turn it into a blog post. It is of the “all out” type..


I am wondering about the use of the term “glyph” in voynich world. I suspect it is understood slightly different as in Unicode terminology, i.e. “characters vs. glyphs“. More like in “hiero-glyphs”, I’m afraid.

I think EVA’s “bad” is in that it partially tries to transport meaning in shapes instead of code points, and that it encodes in (E)ASCII instead of Unicode.

If we look at the Copiale Chiffre decipherment, the simple complication of “tokenising” (substituting) german bi- & trigraphs “ch”, “tz”, “sch”, and doubles “ll” and so on in separate homophone sets disrupts statistical language guessing and more.

Now EVA does exactly that when it introduces special characters which could be mapped otherwise in Unicode (like diacritics, dialectic variants, tironian notes, latin & scribal abbreviations, phonotactics [variable spacing] etc.) or be expressed as ligatures. Much worse, the mapping is to high ASCII chars instead of the Unicode Private Use Area. So e.g. we throw an inofficial &163 (£, english pound sign) at the stats where it would eventually need an U+2184, Latin Small Letter Reversed C, etc. The neglecting of ligatures, abbreviations and spacing does it’s deed.

While otherwise little regarded features of medieval crypto techniques of Linguistic Obfuscation, for example Code Switching (using latin, vernacular french, italian etc. intermittently) and layering of these “weak” methods certainly pose an obstacle, some statistical methods are not impeded by the uncertainty of existing transcriptions, like the fundamental note, that there are no capitalizations in, sorry for the pun, “so-called voynichese script”.

The voynichese character set is not an “in situ” creation, meaning it is not invented out of the blue, as this is an almost impossible task for reasons I cannot outline here.

The script uses many ligatures and has many unique scribal abbreviations, along with many borrowings from Tironian notes” would describe it rather well, while this quote is nicked from the Wikipedia article about Insular Script.

Prescribed practice hasn’t been tried, so far. The most difficult problem of “no language, no alphabet” could be digitally tackled with a graphemic transliteration table first, followed by an allographic analysis (comparing mean distribution of possible variants), encoding spatial positions on a character level, encoding emanation types (e.g. inking density for writing order definition), ambiguities etc, etc. Of course encoding the imagery, marginalia, physical properties etc. would be part of the VMS ontology, no matter if TEIP5 or standoff-property style.

Multi-level is the keyword, but I realise this is getting much too lengthy for now while not even beginning to outline the task completely. It means tons of work. I would like to avoid a certain proverb I find ghastly, but it is true:

A lot of bathtubs will have to be unplugged.

The whole duty
of Typography, as of
is to communicate
to the imagination,
without loss
by the way,
the thought or image
to be communicated
by the Author.

An ancient metaphor: thought is a thread, and the raconteur is a spinner of yarns — but the true storyteller, the poet, is a weaver. The scribes made this old and audible abstraction into a new and visible fact. After long practice, their work took on such an even, flexible texture that they called the written page a textus, which means cloth.

Robert Bringhurst, “The Elements of Typographic Style

The Voynich Undusting

Lunch break is over! Now for a short coffee break.. there is a book on the coffee table, a coffee table book, so to say. With strangely familiar looking letter like symbols and incomprehensible naive imagery. So much for shortness.. but let me type along.

I think the Yale facsimile edition of the VMS is excellent. Not so much in terms of “crispness” of the print, which would rather disappoint me. It enables me to “get related” with the physical appearance of the volume, almost as being able to leaf through it. A very different experience from watching it onscreen.

I need to do a few things differently on this blog, in terms of personal resource allocation. Hence, to understand the true nature of the weBlog format, which is to be an aid to condensate my thoughts. You are welcome to accompany me on my breadcrumbs trail. Please excuse brevity from now on.