Publications
My publications and co-authorships in reversed chronological order.
2026
- HALvest-Contrastive: Retrieval-Like Authorship Attribution with Patch-Level Late Interaction2026
Deciding whether two pieces of text share an author is made difficult by topical confound: two writers covering the same topic often look more alike than one writer covering two topics. We tackle this with HALvest, a 17-billion-token multilingual corpus of open-access scholarly papers, and its English contrastive derivative HALvest-Contrastive, in which same-author passages are drawn from distinct papers within a field to minimize topical overlap. We also revisit how documents are compared. Authorship systems traditionally compress each document into a single vector, we keep a sequence of vectors and compare them with late interaction, then introduce Patch-Level Late Interaction (PLI), which compresses neighboring tokens into patches before matching. Matching at the sequence level greatly improves performance over the single-vector baseline, but the optimal interaction granularity is subtle.
@misc{kulumba_halvest_2026, title = {HALvest-Contrastive: Retrieval-Like Authorship Attribution with Patch-Level Late Interaction}, author = {Kulumba, Francis and Antoun, Wissam and Vimont, Guillaume and Romary, Laurent and Cafiero, Florian}, year = {2026}, archiveprefix = {arXiv}, primaryclass = {cs.DL}, url = {https://arxiv.org/abs/2407.20595}, } - Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models2026
Backdoor attacks pose significant security risks for Large Language Models (LLMs), yet the internal mechanisms by which triggers operate remain poorly understood. We present the first mechanistic analysis of language-switching backdoors, studying the GAPperon model family (1B, 8B, 24B parameters) which contains triggers injected during pretraining that cause output language switching. Using activation patching, we localize trigger formation to early layers (7.5-25% of model depth) and identify which attention heads process trigger information. Our central finding is that trigger-activated heads substantially overlap with heads naturally encoding output language across model scales, with Jaccard indices between 0.18 and 0.66 over the top heads identified. This suggests that backdoor triggers do not form isolated circuits but instead co-opt the model’s existing language components. These findings have implications for backdoor defense: detection methods may benefit from monitoring known functional components rather than searching for hidden circuits, and mitigation strategies could potentially leverage this entanglement between injected and natural behaviors.
@misc{lasnier_triggers_2026, title = {Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models}, author = {Lasnier, Théo and Antoun, Wissam and Kulumba, Francis and Seddah, Djamé}, year = {2026}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2602.10382}, } - Language-Switching Triggers Take a Latent Detour Through Language Models2026
Backdoor attacks on language models pose a growing security concern, yet the internal mechanisms by which a trigger sequence hijacks model computations remain poorly understood. We identify a circuit underlying a language-switching backdoor in an 8B-parameter autoregressive language model, where a three-word Latin trigger (nine tokens) redirects English output to French. We decompose the circuit into three phases: (1) distributed attention heads at early layers compose the trigger tokens into the last sequence position; (2) the resulting signal propagates through mid-layers in a subspace orthogonal to the model’s natural language-identity direction; (3) the MLP at the final layer converts this latent signal into French logits. The entire circuit flows through a serial bottleneck at a single position: corrupting that position at any layer entirely mitigate the trigger but also hinder the model’s capabilities. The orthogonal latent encoding suggests that defenses that search for language-like signals in intermediate representations would miss this trigger entirely.
@misc{kulumba_language_2026, title = {Language-Switching Triggers Take a Latent Detour Through Language Models}, author = {Kulumba, Francis and Antoun, Wissam and Lasnier, Théo and Sagot, Benoît and Seddah, Djamé}, year = {2026}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2605.18646}, } - Where Does Authorship Signal Emerge in Encoder-Based Language Models?2026
Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and function-word frequency are equally available at every layer in every model, including in an off-the-shelf control encoder, hence the gap not coming from representation quality. Instead, causal intervention shows that the scorer determines where the encoder consolidates authorship signal. Mean pooling forces consolidation by early to mid layers, while late interaction defers it to later layers. We further derive this difference from the gradient structure of each scorer, and training dynamics reveal distinct learning trajectories that follow from that difference.
@misc{kulumba_does_2026, title = {Where Does Authorship Signal Emerge in Encoder-Based Language Models?}, author = {Kulumba, Francis and Vimont, Guillaume and Romary, Laurent and Cafiero, Florian}, year = {2026}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2605.19908}, }
2024
- CamemBERT 2.0: A Smarter French Language Model Aged to Perfection2024
French language models, such as CamemBERT, have been widely adopted across industries for natural language processing (NLP) tasks, with models like CamemBERT seeing over 4 million downloads per month. However, these models face challenges due to temporal concept drift, where outdated training data leads to a decline in performance, especially when encountering new topics and terminology. This issue emphasizes the need for updated models that reflect current linguistic trends. In this paper, we introduce two new versions of the CamemBERT base model-CamemBERTav2 and CamemBERTv2-designed to address these challenges. CamemBERTav2 is based on the DeBERTaV3 architecture and makes use of the Replaced Token Detection (RTD) objective for better contextual understanding, while CamemBERTv2 is built on RoBERTa, which uses the Masked Language Modeling (MLM) objective. Both models are trained on a significantly larger and more recent dataset with longer context length and an updated tokenizer that enhances tokenization performance for French. We evaluate the performance of these models on both general-domain NLP tasks and domain-specific applications, such as medical field tasks, demonstrating their versatility and effectiveness across a range of use cases. Our results show that these updated models vastly outperform their predecessors, making them valuable tools for modern NLP systems. All our new models, as well as intermediate checkpoints, are made openly available on Huggingface.
@misc{antoun_camembert20_2024, title = {CamemBERT 2.0: A Smarter French Language Model Aged to Perfection}, author = {Antoun, Wissam and Kulumba, Francis and Touchent, Rian and de la Clergerie, Éric and Sagot, Benoît and Seddah, Djamé}, year = {2024}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2411.08868}, }