LRL-Diffusion

Changer le monde, un site à la fois…

Sidebar
Menu

Perl scripts for NooJ-oriented linguists


developed by François Trouilleux


While working on the DM, I wrote a few Perl scripts to deal with the dictionary. I think some of them might be useful to the NooJ community. Their functionality is briefly described below.
I would be glad to share these programs, but I do not make them directly available for download, and rather ask interested people to
contact me. The reason is these little programs have been designed for my specific needs and tested on but one dictionary ; I do not consider them good enough for a more formal release.
nooj-flx2lexc
Given a NooJ inflected dictionary, this program produces a copy in the lexc format of the Xerox Finite State Tool (XFST) (cf. http://www.stanford.edu/~laurik/fsm...), e.g for the DM : dm-lexc_sample.txt. You can then compile the dictionary with XFST and use all the functionalities of that platform.
nooj-flx2property_values
Given a NooJ inflected dictionary, this program produces for each category the list of property values actually used in the dictionary, e.g for the DM (version 1.0.1) : dm-property_values.txt. Useful to spot typing errors on property values and if you want to make a properties.def file.
paradigm-analysis
Given a non-inflected dictionary, this program produces a tabulation separated text table with 4 columns :
  • paradigm name,
  • categories the paradigm applies,
  • number of times the paradigm is used,
  • list of baseforms the paradigm applies to.
The list of baseforms may be limited to cases where it does not exceed some value. The text file may then be opened with MS Excel or Open Office Calc and used as a standard spreadsheet. E.g. with the DM, I obtain this Excel file (with max. 20 baseforms) : dm-modeles.xls.