
Mădălina Chitez
Short bio:
Mădălina Chitez is a Senior Researcher (Assoc. Prof.) in applied corpus linguistics at the Department of Modern Languages and Literatures, West University of Timișoara, Romania. She holds a PhD from the Albert-Ludwigs University of Freiburg, Germany, and has held research positions at Oxford, Siena, and Zurich. She is the president of CODHUS (Centre for Corpus-Related Digital Approaches to Humanities), conducting research at the intersection of corpus linguistics, NLP, and educational studies.
Tutorial: Corpus-Driven NLP for Educational Insights: Tools, Techniques, and Use Cases
This lecture explores the development of corpus-driven NLP applications designed to generate educational insights from Romanian-language data. Focusing on domain-specific annotated corpora, it introduces the ROGER and EXPRES academic writing platforms, which enable advanced corpus queries and the extraction of multi-word expressions, rhetorical move patterns, and academic discourse features through shallow parsing and n-gram analysis. These resources have informed the creation of derived lexical datasets, such as structured academic phrase lists and a phrasal lexicon for writing support. The session also presents the LEMI platform, a readability assessment tool tailored to Romanian children’s literature, which integrates surface-level and syntactic complexity metrics to support age-appropriate text selection. Emphasis will be placed on corpus compilation, annotation schemes, feature engineering, and the integration of linguistic insights into NLP pipelines for educational technology.
Practical session: From Text to Pedagogy: Data Mining Approaches to Educational Content
This hands-on session introduces participants to data mining techniques for analyzing educational materials, with a focus on Romanian school textbooks. Using pre-processed data from the ROTEX corpus and NLP pipelines for verb extraction and Bloom's Taxonomy labeling, participants will explore how instructional tasks reflect cognitive complexity across grades and publishers. Activities will include comparative analysis of textbooks and curriculum standards, identification of task patterns (e.g., prevalence of lower- vs. higher-order thinking), and visualization of cognitive progression using syntactic and semantic annotations. The session emphasizes how computational methods such as POS tagging, pattern matching, and task classification can be used to evaluate educational content quality and pedagogical coherence, particularly in low-resource educational contexts. Participants will leave with hands-on experience in aligning linguistic features with pedagogical frameworks and discussing the broader implications for curriculum development and instructional design.

Dan Cristea
Professor at the Faculty of Computer Science "Alexandru Ioan Cuza" from Iași and corresponding member of the Romanian Academy and Scientific Researcher I at Institute of Computer Science Romanian Academy, Iași Branch.

Andrei Scutelnicu
Lecturer at the Faculty of Computer Science "Alexandru Ioan Cuza" from Iași and Scientific Researcher at Institute of Computer Science, Romanian Academy, Iași Branch.
Abstract:
DeLORo (Deep Learning for Old Romanian) is a project intended to build a technology capable of deciphering old printed and uncial Cyrillic Romanian documents and transliterate them into the Latin script. In this paper we concentrate on the processes of organisation and of acquisition of the primary data necessary in the process of training the deep learning recognition technology. After a brief overview of similar enterprises, we compare our approach with others. Then, we present in some details the structure of DeLORo’s data repository, which includes: images of scanned pages, annotations operated over them, and alignments between annotated objects in the images and sequences of decoded Latin characters. The presentation will focus on the practical module, where a tutorial on how to use the platform will be given.

Mihai Dascălu
Mihai Dascalu is a professor in Computer Science at the National University of Science and Technology POLITEHNICA Bucharest and is responsible for the courses in object-oriented programming, algorithm design, and semantic web. Mihai was head of his class (i.e., GPA 10/10; ranked 1st across specialization and university) at POLITEHNICA Bucharest and holds a double Ph.D. with the highest distinctions in Computer Science (Excellent, POLITEHNICA Bucharest) and Educational Sciences (Très Honorable avec Felicitations, University Grenoble Alpes, France). Mihai has extensive experience in national and international research projects (POC Optimize, POC InsureAI, PTE ATES, PCE INTELLIT, PCE Lib2Life, PCE ROBIN, POC D HUB-TECH, POC G NETIO, H2020 RAGE, ERASMUS+ ENeA-SEA) with more than 300 published papers, including 40 articles at top-tier conferences (AAAI, ACL, CogSci, AIED, COLING), 100+ papers indexed ISI at renowned international conferences (ICALT, ITS, EC-TEL, ICWL, ISPDC, AIMSA), and 10+ Q1 journal papers (Computers & Education, Computers in Human Behavior, Behavior Research Methods, ijCSCL). Complementary to his competencies in ML/NLP, AI in education, and discourse analysis, Mihai holds a multitude of professional certifications (e.g., PMP, PMI-RMP, PMI-ACP, CISA, C|EH, and CISSP). Moreover, Mihai received the award “Mihai Drăgănescu” from the Romanian Academy in 2023, the distinction “IN TEMPORE OPPORTUNO” in 2013 as the most promising young researcher in POLITEHNICA Bucharest, obtained a Senior Fulbright scholarship in 2015, and holds a US patent. Mihai is also a corresponding member of the Academy of Romanian Scientists.

Andreea Duțulescu
Andreea Dutulescu is a PhD student at the National University of Science and Technology Politehnica Bucharest. Her research interests include natural language processing (NLP) in education, synthetic data generation, and related areas in applied machine learning. Andreea has gained experience in both academic and industrial settings. She has completed multiple internships at Google, where she worked on practical ML tasks and contributed to real-world applications. In her academic work, she has been involved in various research projects involving NLP.
Abstract:
The practical session focuses on leveraging vLLM, a high-performance inference library, to achieve structured and accelerated processing with LLMs. Students will learn how to optimize inference workflows, enabling faster response times and efficient resource utilization without compromising output quality. The session also covers generating structured outputs such as regex patterns and JSON formatting, enhancing clarity and usability for downstream applications and data extraction. Through hands-on exercises, attendees will gain practical skills to implement vLLM for scalable, low-latency NLP workflows that deliver both speed and precise, interpretable results.

Liviu Dinu
Short bio:
Liviu P. Dinu is professor at University of Bucharest, Computer Science Department, director of Computer Science Doctoral School, director of Human Language Technologies Research Center, and member of Computer Science and Interdisciplinary Doctoral Schools. His main research is in Computational Linguistics and Natural Language Processing (including themes like language similarity, computational historical linguistics, computational stylometry, application in psychology, phono-morphological computational analysys, etc). Solomon Marcus was his PhD supervisor (obtained in 2003), and in 2014 he defended his habilitation thesis entitled "Similarity and Decision Problems in Computational Linguistics". In 2007 he received "Grigore C. Moisil" Prize, awarded by the Romanian Academy (for 2005). He has published 2 books, 8 chapters in books, over 180 papers in international journals and conferences proceedings, has involved in 31 funded national and international R&D projects (in 17 of them beeing principal investigator). He has also initiated in 2020 a master program in Natural Language Processing at University of Bucharest.
Abstract:
Natural languages are living eco-systems, they are constantly in contact and, by consequence, they change continuously. Traditionally, the main Historical Linguistics problems (How are languages related? How do languages change across space and time?) have been investigated with comparative linguistics instruments.
We propose here, for Romance languages, computer-assisted methods for main problems in HL (related words discrimination, protoword reconstruction, languages similarity, semantic divergence, etc.).
Our studies on Romance languages rely on a digital resource for HL that we constructed and published (RoBoCoP - ROmance BOrrowing COgnate Package) containing a comprehensive and reliable database of Romance cognates and borrowings based on the etymological information provided by publicly available dictionaries in five languages: Spanish, Italian, French, Portuguese, and Romanian (the largest known database of this kind) .
To answer the first question, we are interested not only by the phylogenetic classification of natural languages, but also by the degree of similarity between two languages. Via various techniques and metrics we offer an answer at three levels: phonetic, lexical and syntactic.
For the second question, based on RoBoCoP dataset we were able to perform the most extensive experiments up to date on a series of HL tasks, including cognates identification, cognate-borrowing discrimination, borrowing direction detection, automatic protoword reconstruction, semantic divergence, etc. for Romance languages, using computational methods based on machine learning models for sequence modelling including encoder-decoder transformers in the Flan-T5 family and conditional random fields, and recently obtained state of the art results on these tasks, showing that computer-assisted methods where computational methods are integrated with linguistic knowledge is a viable direction for tackling these problems in historical linguistics.

Mark Finlayson
Short bio:
Dr. Mark A. Finlayson is an Associate Professor of Computer Science and Graduate Program Director in the Knight Foundation School of Computing and Information Sciences at Florida International University (FIU). He received his Ph.D. in Computer Science and Cognitive Science from MIT in 2012 under the supervision of Patrick H. Winston. He also received his M.S. from MIT in 2001 and his B.S. from the University of Michigan in 1998, both in Electrical Engineering. Before joining SCIS he was a Research Scientist in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) for 2½ years. His research focuses on representing, extracting, and using higher-order semantic patterns in natural language, especially focusing on narrative. His work intersects artificial intelligence, computational linguistics, and cognitive science. He directs the Cognac Laboratory (The Cognition, Language, and Culture lab), whose members focus on investigating the science of narrative from a computational point of view. His research has been funded by the NSF, NIH, DARPA, OSD, ONR, DHS, and IBM. He was the recipient of an NSF CAREER Award in 2018 and an IBM Faculty Award in 2019. He was named Edison Fellow for Artificial Intelligence for 2019-2021 at the US Patent and Trademark Office (USPTO). He has received multiple teaching awards at FIU, plus an FIU faculty award for research and creative activities in 2019.
Tutorial: The Basics of Linguistic Annotation
This lecture will introduce the basics of linguistic annotation. We will motivate a continued interest in linguistic annotation by discussing it's fundamental importance in the age of LLMs. Then we will discuss defining an annotation task, obtaining or implementing annotation tools, writing annotation guides and schemes, the recruitment and training of annotators and adjudicators, and the actual process of annotation itself.
Hands-on Exercises: The Process of Annotation
During this hands-on session, students will split into small teams and conduct a small-scale annotation with instructor-provided materials. Teams will go through at least two rounds of annotation and adjudication, to learn first-hand about the often subtle problems that arise when idealized annotation schemes meet real-world data.
Tutorial & Hands-on Exercises: Inter-Annotator Agreement
Measuring agreement between annotators is absolutely fundamental to annotation, in particular for validating annotation schemes, monitoring the quality of annotator training, and assuring the final quality of the annotations. In the first half of this session we will review a number of standard ways of computing agreement. In the second half student teams will reform and compute agreement for their annotated data, and will use these computations as well as detailed disagreement analysis to drive development of insights into possible next steps in refinement of their annotation task.
Tutorial & Discussion: Corpus Assembly & Distribution; The Future of Annotation
For annotated data to be useful to the scientific enterprise, it must be made available to other scientists. In the first half of this session we discuss best practices for corpus collation, formatting, and public release. We will discuss ease of use, ease and permanence of access, and intellectual property concerns. In the latter half of the session we will have a general discussion of the future of annotation in light of recent advances in LLMs and human-machine teaming.

Nancy Ide
Vassar College, Poughkeepsie, New York
Research Professor of Computer Science, Brandeis University, Waltham, Massachusetts
Honorary Professor, Universitatea Alexandru Ioan Cuza,
Iasi, Romania
SPEACIAL GUEST SPEECH
e-mail : ide@cs.vassar.edu

Adrian Iftene
Short bio: Adrian Iftene is a Romanian computer scientist and professor at the Faculty of Computer Science, "Alexandru Ioan Cuza" University of Iași (UAIC). He currently serves as the position of Vice-Rector for digitalization, student and alumni affairs, and cooperation with socio-economic stakeholders at UAIC. His research expertise encompasses artificial intelligence, natural language processing, and mixed realities, with a particular focus on text processing tasks such as entity recognition, sentiment analysis, and question answering in both Romanian and English.

Cristian Simionescu
Short bio: Cristian Simionescu is a PhD student at the Faculty of Computer Science, Alexandru Ioan Cuza University of Iași, Romania. He researches self-supervised deep learning for medical data, with a particular focus on medical image analysis. He also collaborates with Nexus Media, applying deep learning and AI techniques to real‐world problems. His interests include deep learning and applied artificial intelligence.
Large Language Models (LLMs) are often presented as “conversational” engines, yet their true power lies in driving concrete actions within software systems. In this couple of hands-on sessions, you will learn how to adapt a pre-trained LLM to your own domain and seamlessly embed it in an application pipeline. We’ll start by fine-tuning or prompt-engineering an open-source model for a targeted vocabulary (e.g. online retail terminology), then connect the model to a simple web service or voice-driven interface. By the end of the two sessions you will have built two fully functional demos:
1. Natural-Language e-Commerce: “Add three summer dresses and a pair of sneakers to my cart” – the model parses the request, looks up SKUs, and issues API calls to an example store.
2. Voice-Activated Workflows: “Schedule tomorrow’s team meeting, send invites, and share the agenda” – the model interprets high-level commands and triggers microservice operations via a REST interface.
Along the way, we’ll cover best practices for prompt design, error handling, and performance monitoring in production. This session is ideal for linguists, developers, and data scientists who want to move beyond chatbots and build real-world applications powered by LLMs. Bring your laptop and get ready to turn language into action!

Marius Leordeanu
Short bio:
I am Professor of Computer Science at the Politehnica University of Bucharest and Senior Researcher at the Institute of Mathematics of the Romanian Academy (IMAR). In everything I do I try to push further our understanding of Intelligence and study its ways of connecting to the world and construct our reality of mind.
At Politehnica, I introduced in 2014, for the first time in our Computer Science Department, the courses of Computer Vision and Robotics and at IMAR I started the Advanced Computer Vision Seminar, with weekly meetings since 2016. I received my PhD from the Robotics Institute of Carnegie Mellon University, in the USA (2009) and Bachelors in Computer Science and Mathematics from the City University of New York (2003).
For my work on unsupervised learning for graph matching I received, in 2014, as a single recipient, the “Grigore Moisil” Prize", the most prestigious award in Mathematics given by the Romanian Academy (please see my book: Unsupervised Learning in Space and Time, Springer, 2020). Now, in 2024, for my scientific results obtained throughout my career together with my wonderful group of young researchers and doctoral students, I received the Grand Prize in Computer Science and Mathematics, at the Romanian Research Gala (February, 2024)
Besides science, I am interested in music, with one professional album and another single composed and co-produced ("Supersonic" 2022, "Cosmic Attraction" 2024), I write poetry (with two volumes published: "Povestea unui Cuvant" 2013 and "Odiseu si Destinul" 2024, and another one to be published in 2024 at Curtea Veche Publishing House) and prose (with one book published - "Ma numesc albastru", 2016). In my activities I always aim to connect, in a natural and coherent way, both from human and computational perspectives, different domains of human experience, in order to better understand the real world, at interplay between mind and matter.
TBA...

Rada Mihalcea
Rada Mihalcea is the Janice M. Jenkins Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She was a program co-chair for EMNLP 2009 and ACL 2011, and a general chair for NAACL 2015 and *SEM 2019. She is an ACM Fellow, a AAAI Fellow, and a former President of the ACL. She is the recipient of a Sarah Goddard Power award for her contributions to diversity in science, an honorary citizen of her hometown of Cluj-Napoca, Romania, and the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama.

Angana Borah
Angana Borah is a PhD candidate in Computer Science and Engineering at the University of Michigan, advised by Prof. Rada Mihalcea. Her research lies at the intersection of Natural Language Processing and Computational Social Science, with a focus on multi-agent interactions, cultural bias, and misinformation in LLMs. She has co-authored papers at ACL, NAACL, EMNLP, and AAAI. Angana has held research positions at institutions including Georgia Tech, University of Texas at Austin, and German Research Center for Artificial Intelligence (DFKI), and has also worked in industry at Microsoft. She has been invited to speak at venues such as NYU’s Center for Conflict and Cooperation and Microsoft Research Africa, and actively contributes to the NLP community as a reviewer and organizer for workshops and conferences, including ACL and COLING.
Using Multi-Agent Systems to Explore and Model Human Social Behavior
Recent advancements in multi-agent systems have led to a rapidly growing research area focused on simulating increasingly complex human behaviors — such as group consensus, implicit bias, persuasion, and, in some cases, cooperation or conflict. Yet, these systems exist in a paradox: they are computational and artificial, entirely lacking the intrinsic consciousness, emotions, and social intuition that define human individuals and societies. In this tutorial, we will explore the evolving relationship between AI agents and human behavior, drawing on large-scale generative agent experiments, studies on bias in multi-agent interactions, and insights into misinformation and group behavior. We will also discuss the broader implications of these systems — not only for the future of AI but also for human-centered disciplines such as psychology, sociology, and ethics, where they can challenge or facilitate our understanding of intelligence, agency, and collective decision-making.
This tutorial will also include a hands-on component, where we will demonstrate how to instrument LLM agents to explore various dimensions of social behavior in multi-agent settings, with additional exercises on evaluating agent interactions and testing for bias.

Dan Tufiș
Short bio:
Member of the Romanian Academy, Professor, Ph. D., Senior Scientist grade I.
Doctor Honoris Causa of the ”Agora” University of Oradea
Honorary Professor of Alexandru Ioan Cuza University of Iasi, Romania Faculty of Computer Science
European Language Resources Infrastructures: Case Study-ERIC CLARIN
Research Infrastructures are constructions requiring big investments, serving researchers in doing their work. Today, there are 30 ERICs, in various scientific domains, but in the future new ones are expected to be launched. This talk will provide a brief view on one of the most successful RI – CLARIN, which hopefully will be soon joined by Romania.
CLARIN is dedicated to Open Science, promoting the sharing and re-use of language data, interoperability of data and services. It promotes comparative perspectives, multidisciplinary collaboration, transnational research, responsible data science and supports linguistic diversity: data covering many languages, tools for many languages, language resources in all modalities, discipline- & language-agnostic. By following the developments of CLARIN, we developed a LT portal in the same spirit (but much smaller, mainly for Romanian language) and it will be demonstrated and the hand-on session.

Vasile Păiș
Short bio:
Senior Researcher II Romanian Academy
RELATE – A Portal for Language Technologies
The students will be introduced to a CLARIN-like collection of tools, resources and services. RELATE is a modular state-of-the-art platform developed at RACAI (Dr. Păiș, Dr. Ion), that is used for processing written and spoken language (mainly Romanian but not only). Resources and technologies were developed in our institute as well as by partner institutions. RELATE is used in multiple national and international research projects. It was designed to use standardized file formats, ensuring interoperability with other language processing systems. Internal functions are available as JSON REST web services. In the Representational State Transfer (REST) architectural style, data and functionality are considered resources and are accessed using Uniform Resource Identifiers (URIs).