The Author

NooJ

A Corpus Processor - A Linguistic Development Environment - A Linguistic Engine for developing Natural Language Processing software Applications.

Max Silberztein

Je suis professeur à l'Université de Franche-Comté

En français :

— Mon cours sur les grammaires formelles.

— Mon cours sur les outils des humanités numériques pour l'analyse statistique de textes.

— Mon cours sur la formalisation des langues

— Mon cours sur les bases de données

In English:

— My NooJ video tutorial and the corresponding PDF presentations.

My research

I don't like magical results produced by black boxes. I like to understand how things work.

ORCID: https://orcid.org/0000-0003-0930-6463

Google Scholar

My publications.

I wish to express many thanks to my colleagues and students and all NooJ users who have contributed to help enhance INTEX, NooJ, ATISHS and now WebNooJ, with their patience, criticisms, creative ideas and ambitious expectations.

1981-1985: Degree of Engineering at the Ecole des Mines de Douai. There, I especially enjoyed my summer jobs as a coal miner at the Aix-En-Provence mine, a surveyor at the Nogent-Sur-Seine nuclear power plant, an industrial designer in a naval shipyard in Bretagne, and a programmer in the Robotics department of the Technion University, Haifa, Israel.

1985-1989: I did my PhD at the LADL laboratory of the Université Paris 7 (Prof. Maurice Gross). I used the linguistic data collected at the LADL to co-design with Blandine Courtois the first system of electronic dictionaries, i.e. that can be used by automatic sofware to perform linguistic analyses: the DELA. I constructed the first full-coverage dictionary of multiword units for French: the DELAC. I developed the first automatic Lexical Parser for Natural Languages, that can recognize and represent all elements of Natural Languages' vocabularies: intra-word units, simple words, multiword units as well as discontiguous expressions, and all the corresponding potential ambiguities.

1990-1991: I worked at the Institute for the Learning Sciences at Nortwestern University (Prof. R. Schank) where I developed a platform to represent scenarii and goals in a Deterministic (as opposed to Empirical) A.I. system. The platform included a Natural Language Parser; for example in a "Buy a baguette in Paris" scenario, the student would have to type in sentences in French, then the computer would answer with a video clip, guiding the student until they leave the bakery with a "baguette pas trop cuite coupée en deux" and their correct change (and be polite!).

1991-2002: I developed INTEX as a platform to formalize the linguistic resources developed at the LADL and to process them in order to perform automatic linguistic analyses of texts. By imposing a common methodology and format for representing lexical, morphological and syntactic information, INTEX was instrumental in the creation of the RELEX network of laboratories who formalized the lexicons and morphology for a dozen languages.

1999-2002: invited by the IBM Watson Research Center (NY), I developed a Finite-State Transducer Toolbox used by IBM's LanguageWare Middleware offer.

2002-: I have been working on NooJ, ATISHS and WebNooJ

2013: I participated in the European CESAR Metanet program, which constructed an open-source Java version of NooJ associated with linguistic resources for nine European languages.

2020-: I have developed the ATISHS software, which offers the tools needed by researchers in the Humanities and the Social Sciences to construct and analyze their own corpora. As opposed to the tools currently available in the Digital Humanities, ATISHS uses carefully handcrafted linguistic resources to process meaningful linguistic units, rather than graphical wordforms.

2022-: I am working on a WEB interface for NooJ: WebNooJ that runs on a LINUX server.