Francis Kulumba

Ph.D. Candidate in NLP · Research Scientist · ALMAnaCH, Inria Paris · Sorbonne Université

prof_pic.jpg

I am a Ph.D. candidate in natural language processing at Inria Paris in the ALMAnaCH team, Sorbonne Université, supervised by Laurent Romary. I am currently a Research Scientist at the French Ministry of Defense, where I help the research and deployment effort of a domain-specific French embedding model for administrations’ needs.

My research focuses on authorship attribution through learned representations of writing style, combining contrastive learning, information retrieval, and mechanistic interpretability. During my Ph.D., I built and released HALvest, a multilingual scholarly corpus, and its contrastive derivative HALvest-Contrastive. I trained embedding models that outperform baselines by a factor of four on stylometric retrieval. I also characterized where authorship signal emerges in encoder-based language models and traced the internal circuits of an 8B-parameter language model to explain how a planted backdoor trigger reroutes its output.

I also enjoy teaching. I co-designed and taught an Advanced NLP graduate course at EPITA, and served as a teaching assistant at Paris 1 Panthéon-Sorbonne.

Download CV


Releases

🤗 downloads / month.

resource   downloads/mo
almanach/halvest 17-billion-token multilingual scholarly corpus
almanach/halvest-contrastive authorship attribution benchmark
almanach/camembertv2-base french RoBERTa-like encoder
almanach/camembertav2-base french DeBERTav3-like encoder

latest posts

selected publications

  1. HALvest-Contrastive: Retrieval-Like Authorship Attribution with Patch-Level Late Interaction
    Francis Kulumba, Wissam Antoun, Guillaume Vimont, and 2 more authors
    2026
  2. Language-Switching Triggers Take a Latent Detour Through Language Models
    Francis Kulumba, Wissam Antoun, Théo Lasnier, and 2 more authors
    2026
  3. Where Does Authorship Signal Emerge in Encoder-Based Language Models?
    Francis Kulumba, Guillaume Vimont, Laurent Romary, and 1 more author
    2026