Publications

Gautier Evennou, Antoine Chaffin, Vivien Chappelier, Ewa Kijak

WACV 2025

Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation (2025)

Gautier Evennou, Antoine Chaffin, Vivien Chappelier, Ewa Kijak

Abstract

The rise of the generative models quality during the past years enabled the generation of edited variations of images at an important scale. To counter the harmful effects of such technology, the Image Difference Captioning (IDC) task aims to describe the differences between two images. While this task is successfully handled for simple 3D rendered images, it struggles on real-world images. The reason is twofold: the training data-scarcity, and the difficulty to capture fine-grained differences between complex images. To address those issues, we propose in this paper a simple yet effective framework to both adapt existing image captioning models to the IDC task and augment IDC datasets. We introduce BLIP2IDC, an adaptation of BLIP2 to the IDC task at low computational cost, and show it outperforms two-streams approaches by a significant margin on real-world IDC datasets. We also propose to use synthetic augmentation to improve the performance of IDC models in an agnostic fashion. We show that our synthetic augmentation strategy provides high quality data, leading to a challenging new dataset well-suited for IDC named Syned1.

BibTeX

@misc{blip2idc,
author = {Gautier Evennou and Antoine Chaffin and Vivien Chappelier and Ewa Kijak},
title = {Reframing Image Difference Captioning with {BLIP2IDC} and Synthetic Augmentation},
booktitle = {Proceedings of the 2025 Winter Conference on Applications of Computer Vision, WACV 2025},
year = {2025},
}

Links

WACV 2025

Read the paper Get the code

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli

🤗

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference (2024)

Abstract

Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.

BibTeX

@misc{modernbert,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}

Links

Read the paper Read the blogpost

Benjamin Clavié, Antoine Chaffin, Griffin Adams

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling (2024)

Benjamin Clavié, Antoine Chaffin, Griffin Adams

Abstract

Over the last few years, multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR. By storing representations at the token level rather than at the document level, these methods have demonstrated very strong retrieval performance, especially in out-of-domain settings. However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback, hindering practical adoption. In this paper, we introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored. This method can reduce the space & memory footprint of ColBERT indexes by 50% with virtually no retrieval performance degradation. This method also allows for further reductions, reducing the vector count by 66%-to-75%, with degradation remaining below 5% on a vast majority of datasets. Importantly, this approach requires no architectural change nor query-time processing, and can be used as a simple drop-in during indexation with any ColBERT-like model.

BibTeX

@article{DBLP:journals/corr/abs-2409-14683,
author = {Benjamin Clavi{\'{e}} and Antoine Chaffin and Griffin Adams},
title = {Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling},
journal = {CoRR},
volume = {abs/2409.14683},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2409.14683},
doi = {10.48550/ARXIV.2409.14683},
eprinttype = {arXiv},
eprint = {2409.14683},
timestamp = {Tue, 15 Oct 2024 20:29:45 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2409-14683.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Links

Read the paper Read the blogpost

Antoine Chaffin

Antoine Chaffin, Vincent Claveau, Ewa Kijak

ICIP 2024

Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning (2024)

Antoine Chaffin, Vincent Claveau, Ewa Kijak

Abstract

Training image captioning models using teacher forcing results in very generic samples, whereas more distinctive captions can be very useful in retrieval applications or to produce alternative texts describing images for accessibility. Reinforcement Learning (RL) allows to use cross-modal retrieval similarity score between the generated caption and the input image as reward to guide the training, leading to more distinctive captions. Recent studies show that pre-trained cross-modal retrieval models can be used to provide this reward, completely eliminating the need for reference captions. However, we argue in this paper that Ground Truth (GT) captions can still be useful in this RL framework. We propose a new image captioning model training strategy that makes use of GT captions in different ways. Firstly, they can be used to train a simple MLP discriminator that serves as a regularization to prevent reward hacking and ensures the fluency of generated captions, resulting in a textual GAN setup extended for multimodal inputs. Secondly, they can serve as additional trajectories in the RL strategy, resulting in a teacher forcing loss weighted by the similarity of the GT to the image. This objective acts as an additional learning signal grounded to the distribution of the GT captions. Thirdly, they can serve as strong baselines when added to the pool of captions used to compute the proposed contrastive reward to reduce the variance of gradient estimate. Experiments on MS-COCO demonstrate the interest of the proposed training strategy to produce highly distinctive captions while maintaining high writing quality.

Venue

In Proceedings of the 2024 IEEE International Conference on Image Processing, ICIP 2024

BibTeX

@inproceedings{DBLP:journals/corr/abs-2402-13936,
author = {Antoine Chaffin and Vincent Claveau and Ewa Kijak},
booktitle={2024 IEEE International Conference on Image Processing (ICIP)},
title={Distinctive Image Captioning: Leveraging Ground Truth Captions in Clip Guided Reinforcement Learning},
volume={},
number={},
pages={2550-2556},
year = {2024},
doi={10.1109/ICIP51287.2024.10647426}
}

Links

ICIP 2024

Read the paper Get the code

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon

WIFS 2023

Three Bricks to Consolidate Watermarks for Large Language Models (2023)

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon

Abstract

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10e-6). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

Venue

In Proceedings of the 2023 IEEE International Workshop on Information Forensics and Security, WIFS 2023

BibTeX

@inproceedings{DBLP:conf/wifs/FernandezCTCF23,
author = {Pierre Fernandez and Antoine Chaffin and Karim Tit and Vivien Chappelier and Teddy Furon},
title = {Three Bricks to Consolidate Watermarks for Large Language Models},
booktitle = {{IEEE} International Workshop on Information Forensics and Security, {WIFS} 2023, N{\"{u}}rnberg, Germany, December 4-7, 2023},
pages = {1--6},
publisher = {{IEEE}},
year = {2023},
url = {https://doi.org/10.1109/WIFS58808.2023.10374576},
doi = {10.1109/WIFS58808.2023.10374576},
timestamp = {Tue, 09 Jan 2024 18:03:46 +0100},
biburl = {https://dblp.org/rec/conf/wifs/FernandezCTCF23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Links

WIFS 2023

Read the paper Get the code

Antoine Chaffin, Julien Delaunay

BlackboxNLP 2023

"Honey, Tell Me What's Wrong'', Global Explainability of NLP Models through Cooperative Generation (2023)

Antoine Chaffin, Julien Delaunay

Abstract

The ubiquity of complex machine learning has raised the importance of model-agnostic explanation algorithms. These methods sample artificial instances by slightly perturbing target instances and observing the variations in the model decision. However, such methods require access to initial samples and only provide explanations of the decision for these. To tackle these problems, we propose Therapy, the first global and model-agnostic explanation method adapted to text which requires no input dataset. This method generates texts following the distribution learned by a classifier through cooperative generation. Because it does not rely on initial samples, it allows to generate explanations in cases where no data is available (e.g., for confidentiality reasons). Moreover, conversely to existing methods that combine multiple local explanations into a global one, Therapy offers a global overview of the model behavior on the input space. Our experiments show that although using no input data to generate samples, Therapy provides insightful information about features used by the classifier that is competitive with the ones from methods relying on input samples and outperforms them when input samples are not specific to the studied model.

Venue

In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2023

BibTeX

@inproceedings{chaffin-delaunay-2023-honey-tell,
title = "{``}Honey, Tell Me What{'}s Wrong{''}, Global Explanation of Textual Discriminative Models through Cooperative Generation",
author = "Chaffin, Antoine and Delaunay, Julien",
editor = "Belinkov, Yonatan and Hao, Sophie and Jumelet, Jaap and Kim, Najoung and McCarthy, Arya and Mohebbi, Hosein",
booktitle = "Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP", month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.blackboxnlp-1.6",
doi = "10.18653/v1/2023.blackboxnlp-1.6",
pages = "76--88",
}

Links

BlackboxNLP 2023

Read the paper Get the code

Sylvain Lamprier, Thomas Scialom, Antoine Chaffin, Vincent Claveau, Ewa Kijak, Jacopo Staiano, Benjamin Piwowarski

ICML 2022

Generative Cooperative Networks for Natural Language Generation (2022)

Sylvain Lamprier, Thomas Scialom, Antoine Chaffin, Vincent Claveau, Ewa Kijak, Jacopo Staiano, Benjamin Piwowarski

Abstract

Generative Adversarial Networks (GANs) have known a tremendous success for many continuous generation tasks, especially in the field of image generation. However, for discrete outputs such as language, optimizing GANs remains an open problem with many instabilities, as no gradient can be properly back-propagated from the discriminator output to the generator parameters. An alternative is to learn the generator network via reinforcement learning, using the discriminator signal as a reward, but such a technique suffers from moving rewards and vanishing gradient problems. Finally, it often falls short compared to direct maximum-likelihood approaches. In this paper, we introduce Generative Cooperative Networks, in which the discriminator architecture is cooperatively used along with the generation policy to output samples of realistic texts for the task at hand. We give theoretical guarantees of convergence for our approach, and study various efficient decoding schemes to empirically achieve state-of-the-art results in two main NLG tasks.

Recorded spotlight

Venue

In Proceedings of the 39th International Conference on Machine Learning, ICML 2022

BibTeX

@inproceedings{DBLP:conf/icml/LamprierSCCKSP22,
author = {Sylvain Lamprier and Thomas Scialom and Antoine Chaffin and Vincent Claveau and Ewa Kijak and Jacopo Staiano and Benjamin Piwowarski},
editor = {Kamalika Chaudhuri and Stefanie Jegelka and Le Song and Csaba Szepesv{\'{a}}ri and Gang Niu and Sivan Sabato},
title = {Generative Cooperative Networks for Natural Language Generation},
booktitle = {International Conference on Machine Learning, {ICML} 2022, 17-23 July 2022, Baltimore, Maryland, {USA}},
series = {Proceedings of Machine Learning Research},
volume = {162},
pages = {11891--11905},
publisher = {{PMLR}},
year = {2022},
url = {https://proceedings.mlr.press/v162/lamprier22a.html},
timestamp = {Tue, 12 Jul 2022 17:36:52 +0200},
biburl = {https://dblp.org/rec/conf/icml/LamprierSCCKSP22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Links

ICML 2022

Read the paper

Antoine Chaffin, Thomas Scialom, Sylvain Lamprier, Jacopo Staiano, Benjamin Piwowarski, Ewa Kijak, Vincent Claveau

SIGIR 2022

Which Discriminator for Cooperative Text Generation? (2022)

Antoine Chaffin, Thomas Scialom, Sylvain Lamprier, Jacopo Staiano, Benjamin Piwowarski, Ewa Kijak, Vincent Claveau

Abstract

Language models generate texts by successively predicting probability distributions for next tokens given past ones. A growing field of interest tries to leverage external information in the decoding process so that the generated texts have desired properties, such as being more natural, non toxic, faithful, or having a specific writing style. A solution is to use a classifier at each generation step, resulting in a cooperative environment where the classifier guides the decoding of the language model distribution towards relevant texts for the task at hand. In this paper, we examine three families of (transformer-based) discriminators for this specific task of cooperative decoding: bidirectional, left-to-right and generative ones. We evaluate the pros and cons of these different types of discriminators for cooperative generation, exploring respective accuracy on classification tasks along with their impact on the resulting sample quality and computational performances. We also provide the code of a batched implementation of the powerful cooperative decoding strategy used for our experiments, the Monte Carlo Tree Search, working with each discriminator for Natural Language Generation.

Short presentation video

Venue

In Proceedings of the 45th International Conference on Research and Development in Information Retrieval, SIGIR 2022

BibTeX

@inproceedings{DBLP:conf/sigir/ChaffinSLSPKC22,
author = {Antoine Chaffin and Thomas Scialom and Sylvain Lamprier and Jacopo Staiano and Benjamin Piwowarski and Ewa Kijak and Vincent Claveau},
editor = {Enrique Amig{\'{o}} and Pablo Castells and Julio Gonzalo and Ben Carterette and J. Shane Culpepper and Gabriella Kazai},
title = {Which Discriminator for Cooperative Text Generation?},
booktitle = {{SIGIR} '22: The 45th International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022},
pages = {2360--2365},
publisher = {{ACM}},
year = {2022},
url = {https://doi.org/10.1145/3477495.3531858},
doi = {10.1145/3477495.3531858},
timestamp = {Sat, 09 Jul 2022 09:25:34 +0200},
biburl = {https://dblp.org/rec/conf/sigir/ChaffinSLSPKC22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Links

SIGIR 2022

Read the paper Get the code

Antoine Chaffin, Vincent Claveau, Ewa Kijak

NAACL 2022

PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding (2022)

Antoine Chaffin, Vincent Claveau, Ewa Kijak

Abstract

Recorded talk

Venue

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2022

BibTeX

@inproceedings{DBLP:conf/naacl/ChaffinCK22,
author = {Antoine Chaffin and Vincent Claveau and Ewa Kijak},
editor = {Marine Carpuat and Marie{-}Catherine de Marneffe and Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
title = {{PPL-MCTS:} Constrained Textual Generation Through Discriminator-Guided {MCTS} Decoding},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, {NAACL} 2022, Seattle, WA, United States, July 10-15, 2022},
pages = {2953--2967},
publisher = {Association for Computational Linguistics},
year = {2022},
url = {https://doi.org/10.18653/v1/2022.naacl-main.215},
doi = {10.18653/v1/2022.naacl-main.215},
timestamp = {Mon, 01 Aug 2022 16:28:04 +0200},
biburl = {https://dblp.org/rec/conf/naacl/ChaffinCK22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Links

NAACL 2022

Read the paper Get the code

Big Science Project

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Big Science Project

Abstract

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

BibTeX

@article{DBLP:journals/corr/abs-2211-05100,
author = {Teven Le Scao and Angela Fan and Christopher Akiki and Ellie Pavlick and Suzana Ilic and Daniel Hesslow and Roman Castagn{\'{e}} and Alexandra Sasha Luccioni and Fran{\c{c}}ois Yvon and Matthias Gall{\'{e}} and Jonathan Tow and Alexander M. Rush and Stella Biderman and Albert Webson and Pawan Sasanka Ammanamanchi and Thomas Wang and Beno{\^{\i}}t Sagot and Niklas Muennighoff and Albert Villanova del Moral and Olatunji Ruwase and Rachel Bawden and Stas Bekman and Angelina McMillan{-}Major and Iz Beltagy and Huu Nguyen and Lucile Saulnier and Samson Tan and Pedro Ortiz Suarez and Victor Sanh and Hugo Lauren{\c{c}}on and Yacine Jernite and Julien Launay and Margaret Mitchell and Colin Raffel and Aaron Gokaslan and Adi Simhi and Aitor Soroa and Alham Fikri Aji and Amit Alfassy and Anna Rogers and Ariel Kreisberg Nitzav and Canwen Xu and Chenghao Mou and Chris Emezue and Christopher Klamm and Colin Leong and Daniel van Strien and David Ifeoluwa Adelani and et al.},
title = {{BLOOM:} {A} 176B-Parameter Open-Access Multilingual Language Model},
journal = {CoRR},
volume = {abs/2211.05100},
year = {2022},
url = {https://doi.org/10.48550/arXiv.2211.05100},
doi = {10.48550/arXiv.2211.05100},
eprinttype = {arXiv},
eprint = {2211.05100},
timestamp = {Tue, 15 Nov 2022 15:45:12 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2211-05100.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Links

Read the paper

Vincent Claveau, Antoine Chaffin, Ewa Kijak

LREC 2022

Generating artificial texts as substitution or complement of training data (2022)

Vincent Claveau, Antoine Chaffin, Ewa Kijak

Abstract

The quality of artificially generated texts has considerably improved with the advent of transformers. The question of using these models to generate learning data for supervised learning tasks naturally arises. In this article, this question is explored under 3 aspects: (i) are artificial data an efficient complement? (ii) can they replace the original data when those are not available or cannot be distributed for confidentiality reasons? (iii) can they improve the explainability of classifiers? Different experiments are carried out on Web-related classification tasks -- namely sentiment analysis on product reviews and Fake News detection -- using artificially generated data by fine-tuned GPT-2 models. The results show that such artificial data can be used in a certain extend but require pre-processing to significantly improve performance. We show that bag-of-word approaches benefit the most from such data augmentation.

Venue

In Proceedings of the 2022 Language Resources and Evaluation Conference, LREC 2022

BibTeX

@InProceedings{claveau-chaffin-kijak:2022:LREC,
author = {Claveau, Vincent and Chaffin, Antoine and Kijak, Ewa},
title = {Generating Artificial Texts as Substitution or Complement of Training Data},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
month = {June},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {4260--4269},
abstract = {The quality of artificially generated texts has considerably improved with the advent of transformers. The question of using these models to generate learning data for supervised learning tasks naturally arises, especially when the original language resource cannot be distributed, or when it is small. In this article, this question is explored under 3 aspects: (i) are artificial data an efficient complement? (ii) can they replace the original data when those are not available or cannot be distributed for confidentiality reasons? (iii) can they improve the explainability of classifiers? Different experiments are carried out on classification tasks - namely sentiment analysis on product reviews and Fake News detection - using artificially generated data by fine-tuned GPT-2 models. The results show that such artificial data can be used in a certain extend but require pre-processing to significantly improve performance. We also show that bag-of-words approaches benefit the most from such data augmentation.},
url = {https://aclanthology.org/2022.lrec-1.453}
}

Links

LREC 2022

Read the paper

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush

ICLR 2022

Multitask Prompt Tuning Enables Zero-Shot Task Generalization (2022)

Abstract

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models’ pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several datasets, often outperforming models 16× its size. Further, our model attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6× its size. All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.

Venue

In Proceedings of the 2022 International Conference on Learning Representations, ICLR 2022

BibTeX

@inproceedings{sanh2022multitask,
title={Multitask Prompted Training Enables Zero-Shot Task Generalization},
author={Victor Sanh and Albert Webson and Colin Raffel and Stephen Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Teven Le Scao and Stella Biderman and Leo Gao and Thomas Wolf and Alexander M Rush},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=9Vrb9D0WI4}
}

Links

ICLR 2022

Read the paper Get the code

PyLate

WTF-RL

Therapy

PPL-MCTS

File manager (Server & Client)

Connect 4 with minmax algorithm & alpha-beta pruning

Game of Thrones script generator

BibGenerator

Premoji

Premoji

Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation (2025)

Abstract

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference (2024)

Abstract

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling (2024)

Abstract

Multimodal Misinformation Detection: Overcoming the Training Data Collection Challenge Through Data Generation (2023)

Abstract

Thesis defense (French)

Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning (2024)

Abstract

Three Bricks to Consolidate Watermarks for Large Language Models (2023)

Abstract

"Honey, Tell Me What's Wrong'', Global Explainability of NLP Models through Cooperative Generation (2023)

Abstract

Generative Cooperative Networks for Natural Language Generation (2022)

Abstract

Recorded spotlight

SIGIR 2022

Which Discriminator for Cooperative Text Generation? (2022)

Abstract

Short presentation video

PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding (2022)

Abstract

Recorded talk

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Abstract

Generating artificial texts as substitution or complement of training data (2022)

Abstract

Multitask Prompt Tuning Enables Zero-Shot Task Generalization (2022)

Abstract