po polsku

Jan Tokarski & Zygmunt Saloni

Schematic Reverse Order Index of Polish Word Forms

[from the editor’s foreword]

The Schematic Reverse Order Index of Polish Word Forms (Schematyczny indeks a tergo polskich form wyrazowych, abbr. SIaT) aims to reflect the phenomena not of a given text canon, but of a whole linguistic system understood as potential. The goal of this work is to assign the relevant grammatical interpretation, and subsequently, the entry form of a suitable lexeme (technically called headword) to a text word. Polish words are herein represented not in whole, but by their endings of various lengths. Such endings are arranged in a reversed alphabetical order (in Polish referred to as a tergo).

This composition was designed by Jan Tokarski before the year 1973, when the edition of the Dictionary of Polish (ed. W. Doroszewski) had already been completed, however the reverse order index for that dictionary had not yet been finished. J. Tokarski may have tried to persuade a publishing house or rather an academic institution to patronize the publishing of his work. I do not possess undeniable information on this matter, I may only suppose with some certainty, that he made attempts to gain the support of Witold Doroszewski, however, his efforts were unsuccessful. […]

Immediately after the Index had been compiled, several computer scientists who conducted research on natural language processing (in this case Polish) were the only ones showing their interest in using it. The author granted them access to the manuscript and they were strongly convinced about the work’s value and its innovative character. One of the researchers, Janusz Bień, wrote that Tokarski’s index will be an irreplaceable tool that will aid the process of inflectional analysis and the recognition of Polish words. It was indeed he, to whom we owe further developments of the work. The second IT specialist, Stanisław Waligórski, used the index as a data source for the automatic analysis of Polish texts. He assembled a short extract which covered only the initial elements of Tokarski’s compilation. The extract proved a great help for me in navigating through the manuscript while I was creating the preliminary typescript and checking it.

The issue of the schematic reverse order index was revived in 1980. After some preparatory discussions, professor Tokarski handed the manuscript to me for further elaboration.

The matter was by far not an easy one.

First of all, for the compilation of the work to be undertaken, it required to be typewritten, and to do this, one had to read through the manuscript. Whereas the manuscript — apart from a fragment (the unfinished letter “y”) consisting of 71 pages which had been rewritten as a sample for the purpose of presenting the work — was a hardly legible rough draft, written sloppily and tightly with a pen on different kinds of paper (partly on tissue paper), and what is more — on both sides of the sheets. (I handed the manuscript over to the Archive of the Polish Academy of Sciences in 1998. There, it is kept in the “Collection of biographical documents to the history of Polish culture III -369” [Polish: „Kolekcja dokumentów biograficznych do dziejów kultury polskiej III-369”], register entry no 19.) […]

Unfortunately, the compilation of the text took quite a while and really began only after the author’s death (16.01.1982). After a preliminary typescript had been made in 1982-83, using a mechanical typewriter (this tremendous amount of work was done by Ms. Anna Maliszewska), it turned out that although the author seemed to have completed his work, its contents still left much room for improvements. […]

Already in the initial phase of the editing works, I decided to use a computer to compile the final version of the Index. Relatively soon — that is in 1983 — I was able to access a MERA-400 minicomputer, courtesy of my colleagues from the University of Warsaw Institute of Informatics. At that point, I started to cooperate closely with Krzysztof Szafran, who worked on the parameters and necessary improvements for the standard editing and formatting programs I was then using, he also played his part in operating the computer. In the later stage of my works Mr Szafran proved to be an active and creative assistant in the field of machine work and my main IT consultant.

The text of the Index was entirely uploaded into the computer in the period from 1983 till 1987; it also underwent all necessary modifications. The technical works were partly financed by the Polish Academy of Sciences Institute of Polish, as part of subject MR.III.12 and the resort subject RPBP III 24 coordinated by the Białystok Branch of the University of Warsaw.

At that time, Jan Tokarski’s Schematic Reverse Order Index of Polish Word Forms was also added to the publishing plan of the Polish Academy of Sciences Institute of Polish. The complete, yet not entirely finalized text of the Index (in the form of a primitive computer print from the MERA-400 machine) was handed to Prof. Roman Laskowski, who was to review it. In his argumentative and critical review, he indicated many of the compilation’s weak points which — in his opinion — disqualified it from being published in its current form and required it to be further improved, on the editorial, as well as the analytical level.

A large part of the enhancements was implemented right after the publication of the review. This was possible thanks to a formerly scheduled systematic overview of selected grammatical problems, which was conducted in a series of Master’s Degree dissertations at the Białystok Branch of the University of Warsaw, and whose results were directly included in the later versions of the Index. The authors are listed as follows: Joanna Gradkowska (plurale tantum nouns), Katarzyna Krzywińska (masculine nouns declined according to feminine inflectional patterns), Henryka Namiotko (masculine nouns declined according to neuter inflectional patterns), Bożena Trochimczyk (numerals and subjective pronouns), Maria Puchlik (nouns declined like adjectives), Elżbieta Zajkowska (comparative forms of adjectives and adverbs), Alina Toczyłowska (short forms in adjectival declension), Walentyna Prześniak (adjectival participles), Ewa Szumska (present verb forms), Wanda Niewińska (past verb forms), Danuta Teresa Jankowska (imperative forms), Ewa Kiełbasa (insertional “e” in verb prefixes). I owe sincere thanks to all the people who helped me to accomplish this laborious venture. Also, I cannot forget the colleagues who willingly offered their advice and support during the entire course of the project. They are: Urszula Andrejewicz, Mirosław Bańko, Janusz Bień, Włodzimierz Gruszczyński, Dorota Kopcińska, Marek Świdziński, Ewa Teleżyńska.

[…] a fortunate coincidence enabled the Schematic Reverse Order Index of Polish Word Forms to be published by the reorganized Office of Polish Dictionaries at the Polish Scientific Publishers PWN (Polish: Redakcja Słowników Języka Polskiego Wydawnictwa PWN), with which Jan Tokarski had a strong emotional bond, since he had worked there for 20 years on the 11-volume Polish Academy of Sciences Dictionary of Polish (edited by W. Doroszewski).

In 1991 I made the suggestion to the PWN about publishing the work, while the Białystok Branch of the University of Warsaw offered a subsidy which enabled the works to commence. The publication was enabled thanks to a further subsidy from the Scientific Research Committee (Polish: Komitet Badań Naukowych), granted on the 5 October 1992. For some time, part of the technical works was conducted by Izolda Lewtak on behalf of the PWN publishing house. Dorota Kopcińska, Halina Lipińska and Włodzimierz Gruszczyński (who, owing to his profound knowledge of Polish declension, helped me greatly in my work on many difficult fragments concerning noun forms) agreed to gratuitously read through the nearly final version of the text. A huge token of good will also came from Jan Tokarski’s heirs, who relinquished their copyright royalties.

The first edition of the Index appeared at the beginning of 1993.

All further improvements to the work were largely inspired by readers’ suggestions and remarks. I owe special thanks to Krzysztof Szafran, Mariusz Olko and Robert Wołosz, who uploaded the contents of the Index into Polish language morphological analysis programs and came up with long lists of mistakes and omissions.

The form of the Index’s second edition was also distinctly influenced by the cooperation with Marcin Woliński. By preparing its content, he conducted several computer-aided tests, which also pinpointed many faults.

The second edition appeared at the end of 2001.

Zygmunt Saloni

Translated by Marcin Werelich

The data of the second edition of the Schematic Reverse Order Index of Polish Word Forms are released under the conditions of the GNU General Public License version 3.

The data: SIaT_v2.0.zip