Mădălina Chitez

Mădălina Chitez is a Senior Researcher (Assoc. Prof.) in applied corpus linguistics at the Department of Modern Languages and Literatures, West University of Timișoara, Romania. She holds a PhD from the Albert-Ludwigs University of Freiburg, Germany, and has held research positions at Oxford, Siena, and Zurich. She is the president of CODHUS (Centre for Corpus-Related Digital Approaches to Humanities), conducting research at the intersection of corpus linguistics, NLP, and educational studies.

Tutorial: Corpus-Driven NLP for Educational Insights: Tools, Techniques, and Use Cases

This lecture explores the development of corpus-driven NLP applications designed to generate educational insights from Romanian-language data. Focusing on domain-specific annotated corpora, it introduces the ROGER and EXPRES academic writing platforms, which enable advanced corpus queries and the extraction of multi-word expressions, rhetorical move patterns, and academic discourse features through shallow parsing and n-gram analysis. These resources have informed the creation of derived lexical datasets, such as structured academic phrase lists and a phrasal lexicon for writing support. The session also presents the LEMI platform, a readability assessment tool tailored to Romanian children’s literature, which integrates surface-level and syntactic complexity metrics to support age-appropriate text selection. Emphasis will be placed on corpus compilation, annotation schemes, feature engineering, and the integration of linguistic insights into NLP pipelines for educational technology.

Practical session: From Text to Pedagogy: Data Mining Approaches to Educational Content

This hands-on session introduces participants to data mining techniques for analyzing educational materials, with a focus on Romanian school textbooks. Using pre-processed data from the ROTEX corpus and NLP pipelines for verb extraction and Bloom's Taxonomy labeling, participants will explore how instructional tasks reflect cognitive complexity across grades and publishers. Activities will include comparative analysis of textbooks and curriculum standards, identification of task patterns (e.g., prevalence of lower- vs. higher-order thinking), and visualization of cognitive progression using syntactic and semantic annotations. The session emphasizes how computational methods such as POS tagging, pattern matching, and task classification can be used to evaluate educational content quality and pedagogical coherence, particularly in low-resource educational contexts. Participants will leave with hands-on experience in aligning linguistic features with pedagogical frameworks and discussing the broader implications for curriculum development and instructional design.

Dan Cristea

Professor at the Faculty of Computer Science "Alexandru Ioan Cuza" from Iași and corresponding member of the Romanian Academy and Scientific Researcher I at Institute of Computer Science Romanian Academy, Iași Branch.

Andrei Scutelnicu

Lecturer at the Faculty of Computer Science "Alexandru Ioan Cuza" from Iași and Scientific Researcher at Institute of Computer Science, Romanian Academy, Iași Branch.

Abstract:

DeLORo (Deep Learning for Old Romanian) is a project intended to build a technology capable of deciphering old printed and uncial Cyrillic Romanian documents and transliterate them into the Latin script. In this tutorial we concentrate on the processes of organisation and of acquisition of the primary data necessary in the process of training the deep learning recognition technology. After a brief overview of similar enterprises, we compare our approach with others. Then, we present in some details the structure of DeLORo’s data repository, which includes: images of scanned pages, annotations operated over them, and alignments between annotated objects in the images and sequences of decoded Latin characters. The presentation will focus on the practical module, where a tutorial on how to use the platform will be given.

Mihai Dascălu

Mihai Dascalu is a professor in Computer Science at the National University of Science and Technology POLITEHNICA Bucharest and is responsible for the courses in object-oriented programming, algorithm design, and semantic web. Mihai was head of his class (i.e., GPA 10/10; ranked 1st across specialization and university) at POLITEHNICA Bucharest and holds a double Ph.D. with the highest distinctions in Computer Science (Excellent, POLITEHNICA Bucharest) and Educational Sciences (Très Honorable avec Felicitations, University Grenoble Alpes, France). Mihai has extensive experience in national and international research projects with more than 300 published papers, including 40 articles at top-tier conferences, and 10+ Q1 journal papers. Moreover, Mihai received the award “Mihai Drăgănescu” from the Romanian Academy in 2023, the distinction “IN TEMPORE OPPORTUNO” in 2013 as the most promising young researcher in POLITEHNICA Bucharest, obtained a Senior Fulbright scholarship in 2015, and holds a US patent. Mihai is also a corresponding member of the Academy of Romanian Scientists.

Andreea Duțulescu

Andreea Dutulescu is a PhD student at the National University of Science and Technology Politehnica Bucharest. Her research interests include natural language processing (NLP) in education, synthetic data generation, and related areas in applied machine learning. Andreea has gained experience in both academic and industrial settings. She has completed multiple internships at Google, where she worked on practical ML tasks and contributed to real-world applications. In her academic work, she has been involved in various research projects involving NLP.

Abstract:

The practical session focuses on leveraging vLLM, a high-performance inference library, to achieve structured and accelerated processing with LLMs. Students will learn how to optimize inference workflows, enabling faster response times and efficient resource utilization without compromising output quality. The session also covers generating structured outputs such as regex patterns and JSON formatting, enhancing clarity and usability for downstream applications and data extraction. Through hands-on exercises, attendees will gain practical skills to implement vLLM for scalable, low-latency NLP workflows that deliver both speed and precise, interpretable results.

James Davenport

Short bio:
James Harold Davenport is a British computer scientist who works in computer algebra. Having done his PhD and early research at the Computer Laboratory, University of Cambridge, he is the Hebron and Medlock Professor of Information Technology at the University of Bath in Bath, England.

Abstract:
Generative AI emerged when the AI Act was nearly finalised. The speaker will polish his crystal ball and try to guess what happens as the standardisation process reacts to this development. The EU AI Act is one of the most significant pieces of Ai legislation in the world. This is accepted, even if reluctantly, by all in AI. What is not often realised is that this is written as product safety legislation, a point of view that is unfamiliar to many in AI. Like other product safety legislation, it is written in fairly general terms, using words such as ``unbiased'', leaving it to standards to describe what this actually means in detail. Again, this is unfamiliar to many in AI. We will therefore first look at the Act, and the European standardisation process. In particular, we will look at the various actors in this process. Then we will look at the particular standardisation activities as they currently relate to Natural Language Processing.

Liviu Dinu

Short bio:
Liviu P. Dinu is professor at University of Bucharest, Computer Science Department, director of Computer Science Doctoral School, director of Human Language Technologies Research Center, and member of Computer Science and Interdisciplinary Doctoral Schools. His main research is in Computational Linguistics and Natural Language Processing (including themes like language similarity, computational historical linguistics, computational stylometry, application in psychology, phono-morphological computational analysys, etc). Solomon Marcus was his PhD supervisor (obtained in 2003), and in 2014 he defended his habilitation thesis entitled "Similarity and Decision Problems in Computational Linguistics". In 2007 he received "Grigore C. Moisil" Prize, awarded by the Romanian Academy (for 2005). He has published 2 books, 8 chapters in books, over 180 papers in international journals and conferences proceedings, has involved in 31 funded national and international R&D projects (in 17 of them beeing principal investigator). He has also initiated in 2020 a master program in Natural Language Processing at University of Bucharest.

Abstract:
Natural languages are living eco-systems, they are constantly in contact and, by consequence, they change continuously. Traditionally, the main Historical Linguistics problems (How are languages related? How do languages change across space and time?) have been investigated with comparative linguistics instruments. We propose here, for Romance languages, computer-assisted methods for main problems in HL (related words discrimination, protoword reconstruction, languages similarity, semantic divergence, etc.). Our studies on Romance languages rely on a digital resource for HL that we constructed and published (RoBoCoP - ROmance BOrrowing COgnate Package) containing a comprehensive and reliable database of Romance cognates and borrowings based on the etymological information provided by publicly available dictionaries in five languages: Spanish, Italian, French, Portuguese, and Romanian (the largest known database of this kind) . To answer the first question, we are interested not only by the phylogenetic classification of natural languages, but also by the degree of similarity between two languages. Via various techniques and metrics we offer an answer at three levels: phonetic, lexical and syntactic. For the second question, based on RoBoCoP dataset we were able to perform the most extensive experiments up to date on a series of HL tasks, including cognates identification, cognate-borrowing discrimination, borrowing direction detection, automatic protoword reconstruction, semantic divergence, etc. for Romance languages, using computational methods based on machine learning models for sequence modelling including encoder-decoder transformers in the Flan-T5 family and conditional random fields, and recently obtained state of the art results on these tasks, showing that computer-assisted methods where computational methods are integrated with linguistic knowledge is a viable direction for tackling these problems in historical linguistics.

Mark Finlayson

Short bio:
Dr. Mark A. Finlayson is an Associate Professor of Computer Science and Graduate Program Director in the Knight Foundation School of Computing and Information Sciences at Florida International University (FIU). He received his Ph.D. in Computer Science and Cognitive Science from MIT in 2012 under the supervision of Patrick H. Winston. He also received his M.S. from MIT in 2001 and his B.S. from the University of Michigan in 1998, both in Electrical Engineering. Before joining SCIS he was a Research Scientist in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) for 2½ years. His research focuses on representing, extracting, and using higher-order semantic patterns in natural language, especially focusing on narrative. His work intersects artificial intelligence, computational linguistics, and cognitive science. He directs the Cognac Laboratory (The Cognition, Language, and Culture lab), whose members focus on investigating the science of narrative from a computational point of view. His research has been funded by the NSF, NIH, DARPA, OSD, ONR, DHS, and IBM. He was the recipient of an NSF CAREER Award in 2018 and an IBM Faculty Award in 2019. He was named Edison Fellow for Artificial Intelligence for 2019-2021 at the US Patent and Trademark Office (USPTO). He has received multiple teaching awards at FIU, plus an FIU faculty award for research and creative activities in 2019.

Tutorial: The Basics of Linguistic Annotation
This lecture will introduce the basics of linguistic annotation. We will motivate a continued interest in linguistic annotation by discussing it's fundamental importance in the age of LLMs. Then we will discuss defining an annotation task, obtaining or implementing annotation tools, writing annotation guides and schemes, the recruitment and training of annotators and adjudicators, and the actual process of annotation itself.

Hands-on Exercises: The Process of Annotation
During this hands-on session, students will split into small teams and conduct a small-scale annotation with instructor-provided materials. Teams will go through at least two rounds of annotation and adjudication, to learn first-hand about the often subtle problems that arise when idealized annotation schemes meet real-world data.

Tutorial & Hands-on Exercises: Inter-Annotator Agreement
Measuring agreement between annotators is absolutely fundamental to annotation, in particular for validating annotation schemes, monitoring the quality of annotator training, and assuring the final quality of the annotations. In the first half of this session we will review a number of standard ways of computing agreement. In the second half student teams will reform and compute agreement for their annotated data, and will use these computations as well as detailed disagreement analysis to drive development of insights into possible next steps in refinement of their annotation task.

Tutorial & Discussion: Corpus Assembly & Distribution; The Future of Annotation
For annotated data to be useful to the scientific enterprise, it must be made available to other scientists. In the first half of this session we discuss best practices for corpus collation, formatting, and public release. We will discuss ease of use, ease and permanence of access, and intellectual property concerns. In the latter half of the session we will have a general discussion of the future of annotation in light of recent advances in LLMs and human-machine teaming.

Nancy Ide

Short bio: Nancy Ide is Professor Emerita of Computer Science at Vassar College in Poughkeepsie, New York, and Research Professor of Computer Science at Brandeis University in Waltham, Massachusetts. Since 1997 she has been a co-organizer and professor in the EUROLAN Summer Schools, held biennially in Romania. She has published copiously in the field of computational linguistics in areas such as word sense disambiguation and lexical semantics and has been involved in several major resource-building projects, including MULTEXT, MULTEXT-EAST, the American National Corpus (ANC), and the Manually Annotated Sub-Corpus (MASC). In 1987 she co-founded the Text Encoding Initiative (TEI), for which she recently received the Antonio Zampolli Prize for major contribution to the field by the Association for Digital Humanities. Since then she has contributed to and developed standards for representing language resources for the International Standards Organization (ISO) and led the Language Applications Grid project, which developed a platform for interoperable use of diverse language processing software. In 2007 she founded the Association for Computational Linguistics Special Interest Group for Annotation (SIGANN) and served as its president until 2019. She co-edited the journal Computers and the Humanities from 1995-2004, and since then has served as co-editor-in-chief of the journal Language Resources and Evaluation. Professor Ide is also editor of the Springer book series Text, Speech and Language Technology. Currently, she serves as a member of the ELRA Language Resources Association (ELRA) Board and the core Program Committee for LREC 2026.
.
SPEACIAL GUEST SPEECH From Mechanical Minds to Neural Networks: Charting AI’s Evolution from the 1950s to Today Artificial Intelligence has traveled a remarkable path since its conceptual roots in the 1950s, evolving from symbolic reasoning and rule-based systems to the data-driven, learning-centric approaches that power today’s cutting-edge technologies. This talk traces the major milestones and paradigm shifts that have shaped the development of AI over the past seven decades. We’ll explore the early era of logic and expert systems, the "AI winters" that tested the field’s resilience, and the emergence of machine learning and neural networks that redefined what machines can do. Along the way, we’ll highlight key breakthroughs, the individuals and institutions that propelled progress, and the social and technological forces that influenced each stage. By understanding this historical trajectory, we gain deeper insight into where AI stands today—and where it may be headed next.

Adrian Iftene

Short bio: Adrian Iftene is a Romanian computer scientist and professor at the Faculty of Computer Science, "Alexandru Ioan Cuza" University of Iași (UAIC). He currently serves as the position of Vice-Rector for digitalization, student and alumni affairs, and cooperation with socio-economic stakeholders at UAIC. His research expertise encompasses artificial intelligence, natural language processing, and mixed realities, with a particular focus on text processing tasks such as entity recognition, sentiment analysis, and question answering in both Romanian and English.

Cristian Simionescu

Short bio: Cristian Simionescu is a PhD student at the Faculty of Computer Science, Alexandru Ioan Cuza University of Iași, Romania. He researches self-supervised deep learning for medical data, with a particular focus on medical image analysis. He also collaborates with Nexus Media, applying deep learning and AI techniques to real‐world problems. His interests include deep learning and applied artificial intelligence.

From Chat to Action: Integrating LLMs into Real-World Applications
Large Language Models (LLMs) are often presented as “conversational” engines, yet their true power lies in driving concrete actions within software systems. In this couple of hands-on sessions, you will learn how to adapt a pre-trained LLM to your own domain and seamlessly embed it in an application pipeline. We’ll start by fine-tuning or prompt-engineering an open-source model for a targeted vocabulary (e.g. online retail terminology), then connect the model to a simple web service or voice-driven interface. By the end of the two sessions you will have built two fully functional demos:
1. Natural-Language e-Commerce: “Add three summer dresses and a pair of sneakers to my cart” – the model parses the request, looks up SKUs, and issues API calls to an example store.
2. Voice-Activated Workflows: “Schedule tomorrow’s team meeting, send invites, and share the agenda” – the model interprets high-level commands and triggers microservice operations via a REST interface.

Along the way, we’ll cover best practices for prompt design, error handling, and performance monitoring in production. This session is ideal for linguists, developers, and data scientists who want to move beyond chatbots and build real-world applications powered by LLMs. Bring your laptop and get ready to turn language into action!

Marius Leordeanu

Short bio:
I am Professor of Computer Science at the Politehnica University of Bucharest and Senior Researcher at the Institute of Mathematics of the Romanian Academy (IMAR). In everything I do I try to push further our understanding of Intelligence and study its ways of connecting to the world and construct our reality of mind. At Politehnica, I introduced in 2014, for the first time in our Computer Science Department, the courses of Computer Vision and Robotics and at IMAR I started the Advanced Computer Vision Seminar, with weekly meetings since 2016. I received my PhD from the Robotics Institute of Carnegie Mellon University, in the USA (2009) and Bachelors in Computer Science and Mathematics from the City University of New York (2003). For my work on unsupervised learning for graph matching I received, in 2014, as a single recipient, the “Grigore Moisil” Prize", the most prestigious award in Mathematics given by the Romanian Academy (please see my book: Unsupervised Learning in Space and Time, Springer, 2020). Now, in 2024, for my scientific results obtained throughout my career together with my wonderful group of young researchers and doctoral students, I received the Grand Prize in Computer Science and Mathematics, at the Romanian Research Gala (February, 2024) Besides science, I am interested in music, with one professional album and another single composed and co-produced ("Supersonic" 2022, "Cosmic Attraction" 2024), I write poetry (with two volumes published: "Povestea unui Cuvant" 2013 and "Odiseu si Destinul" 2024, and another one to be published in 2024 at Curtea Veche Publishing House) and prose (with one book published - "Ma numesc albastru", 2016). In my activities I always aim to connect, in a natural and coherent way, both from human and computational perspectives, different domains of human experience, in order to better understand the real world, at interplay between mind and matter.

From Vision To Language through Graph of Events in Space and Time
The task of describing video content in natural language is commonly referred to as video captioning. Unlike conventional video captions, which are typically brief and widely available, long-form paragraph descriptions in natural language are scarce. This limitation of current datasets is due to the expensive human manual annotation required and to the highly challenging task of explaining the language formation process from the perspective of the underlying story, as a complex system of interconnected events in space and time. Through a thorough analysis of recently published methods and available datasets, we identify a general lack of published resources dedicated to the problem of describing videos in complex language, beyond the level of descriptions in the form of enumerations of simple captions. Furthermore, while state-of-the-art methods produce impressive results on the task of generating shorter captions from videos by direct end-to-end learning between the videos and text, the problem of explaining the relationship between vision and language is still beyond our reach. In this presentation, I will propose a shared representation between vision and language, based on graphs of events in space and time, which can be obtained in an explainable and analytical way, to integrate and connect multiple vision tasks to produce the final natural language description. Moreover, I will also demonstrate how our automated and explainable video description generation process can function as a fully automatic teacher to effectively train direct, end-to-end neural student pathways, within a self-supervised neuro-analytical system. We validate that our explainable neuro-analytical approach generates coherent, rich and relevant textual descriptions on videos collected from multiple varied datasets, using both standard evaluation metrics, human annotations and consensus from ensembles of state-of-the-art VLMs.

Rada Mihalcea

Rada Mihalcea is the Janice M. Jenkins Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She was a program co-chair for EMNLP 2009 and ACL 2011, and a general chair for NAACL 2015 and *SEM 2019. She is an ACM Fellow, a AAAI Fellow, and a former President of the ACL. She is the recipient of a Sarah Goddard Power award for her contributions to diversity in science, an honorary citizen of her hometown of Cluj-Napoca, Romania, and the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama.

Angana Borah

Angana Borah is a PhD candidate in Computer Science and Engineering at the University of Michigan, advised by Prof. Rada Mihalcea. Her research lies at the intersection of Natural Language Processing and Computational Social Science, with a focus on multi-agent interactions, cultural bias, and misinformation in LLMs. She has co-authored papers at ACL, NAACL, EMNLP, and AAAI. Angana has held research positions at institutions including Georgia Tech, University of Texas at Austin, and German Research Center for Artificial Intelligence (DFKI), and has also worked in industry at Microsoft. She has been invited to speak at venues such as NYU’s Center for Conflict and Cooperation and Microsoft Research Africa, and actively contributes to the NLP community as a reviewer and organizer for workshops and conferences, including ACL and COLING.

Using Multi-Agent Systems to Explore and Model Human Social Behavior

Recent advancements in multi-agent systems have led to a rapidly growing research area focused on simulating increasingly complex human behaviors — such as group consensus, implicit bias, persuasion, and, in some cases, cooperation or conflict. Yet, these systems exist in a paradox: they are computational and artificial, entirely lacking the intrinsic consciousness, emotions, and social intuition that define human individuals and societies. In this tutorial, we will explore the evolving relationship between AI agents and human behavior, drawing on large-scale generative agent experiments, studies on bias in multi-agent interactions, and insights into misinformation and group behavior. We will also discuss the broader implications of these systems — not only for the future of AI but also for human-centered disciplines such as psychology, sociology, and ethics, where they can challenge or facilitate our understanding of intelligence, agency, and collective decision-making.

This tutorial will also include a hands-on component, where we will demonstrate how to instrument LLM agents to explore various dimensions of social behavior in multi-agent settings, with additional exercises on evaluating agent interactions and testing for bias.

Dan Tufiș

Short bio:
Member of the Romanian Academy, Professor, Ph. D., Senior Scientist grade I.
Doctor Honoris Causa of the ”Agora” University of Oradea
Honorary Professor of Alexandru Ioan Cuza University of Iasi, Romania Faculty of Computer Science

European Language Resources Infrastructures: Case Study-ERIC CLARIN
Research Infrastructures are constructions requiring big investments, serving researchers in doing their work. Today, there are 30 ERICs, in various scientific domains, but in the future new ones are expected to be launched. This talk will provide a brief view on one of the most successful RI – CLARIN, which hopefully will be soon joined by Romania. CLARIN is dedicated to Open Science, promoting the sharing and re-use of language data, interoperability of data and services. It promotes comparative perspectives, multidisciplinary collaboration, transnational research, responsible data science and supports linguistic diversity: data covering many languages, tools for many languages, language resources in all modalities, discipline- & language-agnostic. By following the developments of CLARIN, we developed a LT portal in the same spirit (but much smaller, mainly for Romanian language) and it will be demonstrated and the hand-on session.

Vasile Păiș

Short bio:
Senior Researcher II Romanian Academy

RELATE – A Portal for Language Technologies
The students will be introduced to a CLARIN-like collection of tools, resources and services. RELATE is a modular state-of-the-art platform developed at RACAI (Dr. Păiș, Dr. Ion), that is used for processing written and spoken language (mainly Romanian but not only). Resources and technologies were developed in our institute as well as by partner institutions. RELATE is used in multiple national and international research projects. It was designed to use standardized file formats, ensuring interoperability with other language processing systems. Internal functions are available as JSON REST web services. In the Representational State Transfer (REST) architectural style, data and functionality are considered resources and are accessed using Uniform Resource Identifiers (URIs).