Changer le monde, un site à la fois…


Perl scripts for NooJ-oriented linguists

developed by François Trouilleux

While working on the DM, I wrote a few Perl scripts to deal with the dictionary. I think some of them might be useful to the NooJ community. Their functionality is briefly described below.
I would be glad to share these programs, but I do not make them directly available for download, and rather ask interested people to
contact me. The reason is these little programs have been designed for my specific needs and tested on but one dictionary ; I do not consider them good enough for a more formal release.
Given a NooJ inflected dictionary, this program produces a copy in the lexc format of the Xerox Finite State Tool (XFST) (cf., e.g for the DM : dm-lexc_sample.txt. You can then compile the dictionary with XFST and use all the functionalities of that platform.
Given a NooJ inflected dictionary, this program produces for each category the list of property values actually used in the dictionary, e.g for the DM (version 1.0.1) : dm-property_values.txt. Useful to spot typing errors on property values and if you want to make a properties.def file.
Given a non-inflected dictionary, this program produces a tabulation separated text table with 4 columns :
  • paradigm name,
  • categories the paradigm applies,
  • number of times the paradigm is used,
  • list of baseforms the paradigm applies to.
The list of baseforms may be limited to cases where it does not exceed some value. The text file may then be opened with MS Excel or Open Office Calc and used as a standard spreadsheet. E.g. with the DM, I obtain this Excel file (with max. 20 baseforms) : dm-modeles.xls.