A Corpus Processor - A Linguistic Development Environment -  A Linguistic Engine for developing Natural Language Processing software Applications.

photo max

Max Silberztein

Google Scholar

I wish to express many thanks to my colleagues and students and all NooJ users who have contributed to help enhance INTEX, and now NooJ, with their patience, criticisms, creative ideas and ambitious expectations.

1985-1989: I did my PhD at the LADL laboratory of the Université Paris 7 (Prof. Maurice Gross). I used the linguistic data collected at the LADL to co-design with Blandine Courtois the first system of electronic dictionaries, i.e. that can be used by automatic sofware to perform linguistic analyses: the DELA. I constructed the first full-coverage dictionary of multiword units for French: the DELAC. I developed the first automatic Lexical Parser for Natural Languages, that can recognize and represent all elements of Natural Languages' vocabularies: intra-word units, simple words, multiword units as well as discontiguous expressions, and all the corresponding potential ambiguities.

1990-1991: I worked at the Institute for the Learning Sciences at Nortwestern University (Prof. R. Schank) and conceived a platform to represent scenarii and goals in a Deterministic A.I. system (no stochastic, statistical, neural network, nor Machine Learning approaches). The platform included a Natural Language Parser; for example in a "Buy a baguette in Paris" scenario, the student would have to type in sentences in French, then the computer would answer with a video clip, guiding the student until they leave the bakery with a "baguette pas trop cuite coupée en deux" and their correct change (and be polite!).

1991-2002: I developed INTEX as a platform to formalize the linguistic resources developed at the LADL and to process them in order to perform automatic linguistic analyses of texts. By imposing a common methodology and format for representing lexical, morphological and syntactic information, INTEX was instrumental in the creation of the RELEX network of laboratories who formalized the lexicons and morphology for over 10 languages.

1999-2002: invited by the IBM Watson Research Center (NY), I developed a Finite-State Transducer Toolbox used by IBM's LanguageWare Middleware offer.

2002-: I have been working on NooJ.

2013: I participated in the European CESAR Metanet program, which constructed an open-source Java version of NooJ associated with linguistic resources for nine European languages.

2020-: I am working on the ATISHS software, which offers the tools needed by researchers in the Humanities and the Social Sciences to consruct and analyze their own corpora. As opposed to the tools currently available in the Digital Humanities, ATISHS uses carefully handcrafted linguistic resources to process meaningful linguistic units, rather than graphical wordforms.