WTF-RL
The code of the paper Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning has been made publicly to allow reproduction and enable further research on training both models conjointly.
CS engineer
ML PhD
Driven by a lifelong passion for science and technology, I quickly recognized the potential offered by computer science, particularly using artificial intelligence. This led me to undertake an engineering school, focusing on computer science and acquiring proper coding practices. To keep studying science and share it with others, I continued my academic journey by pursuing a research-oriented master degree in computer science, which eventually culminated in a Ph.D. program. Throughout my end-of-study internship and my Ph.D., I studied multimodality and semantic indexation, aiming to effectively bridge the gap between textual and visual modalities to detect and combat misinformation. Alongside these multimodal aspects, I mostly focused on text generation, with a particular emphasis on cooperative generation that leverages external models to guide the generation process. I finally studied how reinforcement learning is used to train language models, to both integrate it into the cooperative generation paradigm and study multimodal rewards.
I have been invited to do a one month internship to work with the MLIA team on the subject of cooperative generation. This collaboration resulted in two study on the subject.
Supervision of the practical work and the project of the deep learning module of students in 4th year of engineering school, allowing me to deepen some basic notions of the domain, to work on my scientific communication skills and to discover teaching.
Supervision of students in L3 MIAGE at ISTIC during the data analysis module. The module being less advanced than the one I taught at ESIR, it allowed me to focus more on the transmission of elementary notions and the basics of teaching.
End-of-studies internship of my engineering degree as well as my master degree in computer science research at IRISA within the LinkMedia team on the subject of image repurposing detection using multimodal artificial intelligence models. Image repurposing is a particular case of fake news, in which an image previously posted online is reused out of its context to convey false information. To detect such cases of misinformation, it is necessary to jointly process text and image representations that are originally in disjoint spaces. It is in this difference that the difficulty of multimodal studies lies. My attached internship report contains the state of the art in the image repurposing field as well as my contributions, the analysis of the results and development paths for future studies.
Realization of a project for the Basel-Mulhouse airport. This project was a responsive and multilingual website allowing to create requests to access applications and manage these requests. The application has two parts : a form in which people can fill a request and a portal for intuitive and efficient management of the requests. Working independently allowed me to develop my skills in project management and web development through the creation of the data model, the choice of technologies and code structuration.
Rebuilding of the compagny portfolio website using JSP. Creation of a responsive design, functionality addition (real time filtering, autocompletion, image edition tool) and redesign of the data model for performance. Creation of an XML parsing application to extract log file.
Organisation and management of a team of 15 members to offer student concerts at low prices. Creation of various visuals to promote events (Photoshop, Illustrator, Premiere Pro).
Implementation of an automation tool for the creation of documents related to the management of INRS training courses, via the creation of document templates in which the values will be replaced by those associated with the dossier in progress (name of the training applicant, duration of the courses, price, etc.).
Following my end-of-studies internship dealing with the use of multimodal artificial intelligence models for image repurposing detection, I decided to continue in this field by working on a thesis on the use of multimodal models in the fight against fake news. This CIFRE thesis is done in collaboration with IMATAG which collaborates with various journalistic organizations as well as the LinkMedia team of IRISA which is specialized in multimodal studies.
First class honours During last year of engineering school, realization of a double degree in computer science research. The courses were oriented towards artificial intelligence in various domains but more specifically multimedia technologies (images, videos and text). These courses allowed me to acquire both basic knowledge and advanced concepts in artificial intelligence, but also to discover the research environment through reading and writing papers as well as presentations.
Preparatory class and engineering school, IT department. Fast pace to learn new skills quickly and acquisition of scientific culture. Learning of concepts related to computer engineering:
Deepened my knowledge of entrepreneurship and innovation as well as networks and computer security, discovered multimedia compression methods and learned rigorous software testing methodologies.
The code of the paper Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning has been made publicly to allow reproduction and enable further research on training both models conjointly.
The code of the paper "Honey, Tell Me What's Wrong'', Global Explainability of NLP Models through Cooperative Generation has been made publicly to encourage the exploration of cooperative generation as an explanation method.
The code of the paper PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding as well as the follow-up study Which Discriminator for Cooperative Text Generation? has been made publicly available on Github to allow reproduction and futher experimentations in the domain of coperative generation and constrained text generation.
For a practical work of INF3405 at Polytechnique Montréal, we created a server that allow to store files and download them remotly. This project has been made in plain Java without any external libs. Main points of the project are:
All objectives have been achieved and the final code is totally functional. Since the project has been done for school, some useful functionalities have not been implemented to fit the subject. To be really used in practice, we should at least store hashed passwords and be able to handle subfolders (cd function). Source code, executables and documentation are available on Github.
I made a Javascript connect 4 when I was learning to create web layouts and tried to implement an artificial intelligence before giving up due to lack of time. After learning more about AI, I decided to try again and implement a minmax algorithm with alpha-beta pruning. Source code and documentation are available on Github and the project is deployed on my website, do not hesitate to check the code or try the game !
Because I did not like the last season of Game of Thrones, I decided (as a joke) to create a neural network able to generator GoT scripts. I used an Recurrent Neural Network (RNN) and trained it on the first 7 seasons. It learnt to generate sentences that look like GoT scripts. The code is available on Github and you can also check the results of an integral season.
BibGenerator is a script used to generate .bib from a list of paper names. It uses the API made available by DBLP to search for references given names contained in the list.
Premoji is a master project on classifying tweets based on the emoji used. The goal was to try different type of models and compare them. We decided to test the following list of models: Logistic Regression, Random Forests, Multilayer Perceptron, Support Vector Machine and an LSTM that takes Word2Vec embeddings of the tweets as input.
4th year engineering school project on the explainability of time series black box classifiers.
Training image captioning models using teacher forcing results in very generic samples, whereas more distinctive captions can be very useful in retrieval applications or to produce alternative texts describing images for accessibility. Reinforcement Learning (RL) allows to use cross-modal retrieval similarity score between the generated caption and the input image as reward to guide the training, leading to more distinctive captions. Recent studies show that pre-trained cross-modal retrieval models can be used to provide this reward, completely eliminating the need for reference captions. However, we argue in this paper that Ground Truth (GT) captions can still be useful in this RL framework. We propose a new image captioning model training strategy that makes use of GT captions in different ways. Firstly, they can be used to train a simple MLP discriminator that serves as a regularization to prevent reward hacking and ensures the fluency of generated captions, resulting in a textual GAN setup extended for multimodal inputs. Secondly, they can serve as additional trajectories in the RL strategy, resulting in a teacher forcing loss weighted by the similarity of the GT to the image. This objective acts as an additional learning signal grounded to the distribution of the GT captions. Thirdly, they can serve as strong baselines when added to the pool of captions used to compute the proposed contrastive reward to reduce the variance of gradient estimate. Experiments on MS-COCO demonstrate the interest of the proposed training strategy to produce highly distinctive captions while maintaining high writing quality.
The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10e-6). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.
VenueIn Proceedings of the 2023 IEEE International Workshop on Information Forensics and Security, WIFS 2023BibTeX@inproceedings{DBLP:conf/wifs/FernandezCTCF23, author = {Pierre Fernandez and Antoine Chaffin and Karim Tit and Vivien Chappelier and Teddy Furon}, title = {Three Bricks to Consolidate Watermarks for Large Language Models}, booktitle = {{IEEE} International Workshop on Information Forensics and Security, {WIFS} 2023, N{\"{u}}rnberg, Germany, December 4-7, 2023}, pages = {1--6}, publisher = {{IEEE}}, year = {2023}, url = {https://doi.org/10.1109/WIFS58808.2023.10374576}, doi = {10.1109/WIFS58808.2023.10374576}, timestamp = {Tue, 09 Jan 2024 18:03:46 +0100}, biburl = {https://dblp.org/rec/conf/wifs/FernandezCTCF23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }Links
The ubiquity of complex machine learning has raised the importance of model-agnostic explanation algorithms. These methods sample artificial instances by slightly perturbing target instances and observing the variations in the model decision. However, such methods require access to initial samples and only provide explanations of the decision for these. To tackle these problems, we propose Therapy, the first global and model-agnostic explanation method adapted to text which requires no input dataset. This method generates texts following the distribution learned by a classifier through cooperative generation. Because it does not rely on initial samples, it allows to generate explanations in cases where no data is available (e.g., for confidentiality reasons). Moreover, conversely to existing methods that combine multiple local explanations into a global one, Therapy offers a global overview of the model behavior on the input space. Our experiments show that although using no input data to generate samples, Therapy provides insightful information about features used by the classifier that is competitive with the ones from methods relying on input samples and outperforms them when input samples are not specific to the studied model.
VenueIn Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2023BibTeX@inproceedings{chaffin-delaunay-2023-honey-tell, title = "{``}Honey, Tell Me What{'}s Wrong{''}, Global Explanation of Textual Discriminative Models through Cooperative Generation", author = "Chaffin, Antoine and Delaunay, Julien", editor = "Belinkov, Yonatan and Hao, Sophie and Jumelet, Jaap and Kim, Najoung and McCarthy, Arya and Mohebbi, Hosein", booktitle = "Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.blackboxnlp-1.6", doi = "10.18653/v1/2023.blackboxnlp-1.6", pages = "76--88", }Links
Generative Adversarial Networks (GANs) have known a tremendous success for many continuous generation tasks, especially in the field of image generation. However, for discrete outputs such as language, optimizing GANs remains an open problem with many instabilities, as no gradient can be properly back-propagated from the discriminator output to the generator parameters. An alternative is to learn the generator network via reinforcement learning, using the discriminator signal as a reward, but such a technique suffers from moving rewards and vanishing gradient problems. Finally, it often falls short compared to direct maximum-likelihood approaches. In this paper, we introduce Generative Cooperative Networks, in which the discriminator architecture is cooperatively used along with the generation policy to output samples of realistic texts for the task at hand. We give theoretical guarantees of convergence for our approach, and study various efficient decoding schemes to empirically achieve state-of-the-art results in two main NLG tasks.
VenueIn Proceedings of the 39th International Conference on Machine Learning, ICML 2022BibTeX@inproceedings{DBLP:conf/icml/LamprierSCCKSP22, author = {Sylvain Lamprier and Thomas Scialom and Antoine Chaffin and Vincent Claveau and Ewa Kijak and Jacopo Staiano and Benjamin Piwowarski}, editor = {Kamalika Chaudhuri and Stefanie Jegelka and Le Song and Csaba Szepesv{\'{a}}ri and Gang Niu and Sivan Sabato}, title = {Generative Cooperative Networks for Natural Language Generation}, booktitle = {International Conference on Machine Learning, {ICML} 2022, 17-23 July 2022, Baltimore, Maryland, {USA}}, series = {Proceedings of Machine Learning Research}, volume = {162}, pages = {11891--11905}, publisher = {{PMLR}}, year = {2022}, url = {https://proceedings.mlr.press/v162/lamprier22a.html}, timestamp = {Tue, 12 Jul 2022 17:36:52 +0200}, biburl = {https://dblp.org/rec/conf/icml/LamprierSCCKSP22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }Links
Language models generate texts by successively predicting probability distributions for next tokens given past ones. A growing field of interest tries to leverage external information in the decoding process so that the generated texts have desired properties, such as being more natural, non toxic, faithful, or having a specific writing style. A solution is to use a classifier at each generation step, resulting in a cooperative environment where the classifier guides the decoding of the language model distribution towards relevant texts for the task at hand. In this paper, we examine three families of (transformer-based) discriminators for this specific task of cooperative decoding: bidirectional, left-to-right and generative ones. We evaluate the pros and cons of these different types of discriminators for cooperative generation, exploring respective accuracy on classification tasks along with their impact on the resulting sample quality and computational performances. We also provide the code of a batched implementation of the powerful cooperative decoding strategy used for our experiments, the Monte Carlo Tree Search, working with each discriminator for Natural Language Generation.
VenueIn Proceedings of the 45th International Conference on Research and Development in Information Retrieval, SIGIR 2022BibTeX@inproceedings{DBLP:conf/sigir/ChaffinSLSPKC22, author = {Antoine Chaffin and Thomas Scialom and Sylvain Lamprier and Jacopo Staiano and Benjamin Piwowarski and Ewa Kijak and Vincent Claveau}, editor = {Enrique Amig{\'{o}} and Pablo Castells and Julio Gonzalo and Ben Carterette and J. Shane Culpepper and Gabriella Kazai}, title = {Which Discriminator for Cooperative Text Generation?}, booktitle = {{SIGIR} '22: The 45th International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022}, pages = {2360--2365}, publisher = {{ACM}}, year = {2022}, url = {https://doi.org/10.1145/3477495.3531858}, doi = {10.1145/3477495.3531858}, timestamp = {Sat, 09 Jul 2022 09:25:34 +0200}, biburl = {https://dblp.org/rec/conf/sigir/ChaffinSLSPKC22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }Links
Language models generate texts by successively predicting probability distributions for next tokens given past ones. A growing field of interest tries to leverage external information in the decoding process so that the generated texts have desired properties, such as being more natural, non toxic, faithful, or having a specific writing style. A solution is to use a classifier at each generation step, resulting in a cooperative environment where the classifier guides the decoding of the language model distribution towards relevant texts for the task at hand. In this paper, we examine three families of (transformer-based) discriminators for this specific task of cooperative decoding: bidirectional, left-to-right and generative ones. We evaluate the pros and cons of these different types of discriminators for cooperative generation, exploring respective accuracy on classification tasks along with their impact on the resulting sample quality and computational performances. We also provide the code of a batched implementation of the powerful cooperative decoding strategy used for our experiments, the Monte Carlo Tree Search, working with each discriminator for Natural Language Generation.
VenueIn Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2022BibTeX@inproceedings{DBLP:conf/naacl/ChaffinCK22, author = {Antoine Chaffin and Vincent Claveau and Ewa Kijak}, editor = {Marine Carpuat and Marie{-}Catherine de Marneffe and Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z}, title = {{PPL-MCTS:} Constrained Textual Generation Through Discriminator-Guided {MCTS} Decoding}, booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, {NAACL} 2022, Seattle, WA, United States, July 10-15, 2022}, pages = {2953--2967}, publisher = {Association for Computational Linguistics}, year = {2022}, url = {https://doi.org/10.18653/v1/2022.naacl-main.215}, doi = {10.18653/v1/2022.naacl-main.215}, timestamp = {Mon, 01 Aug 2022 16:28:04 +0200}, biburl = {https://dblp.org/rec/conf/naacl/ChaffinCK22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }Links
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
BibTeX@article{DBLP:journals/corr/abs-2211-05100, author = {Teven Le Scao and Angela Fan and Christopher Akiki and Ellie Pavlick and Suzana Ilic and Daniel Hesslow and Roman Castagn{\'{e}} and Alexandra Sasha Luccioni and Fran{\c{c}}ois Yvon and Matthias Gall{\'{e}} and Jonathan Tow and Alexander M. Rush and Stella Biderman and Albert Webson and Pawan Sasanka Ammanamanchi and Thomas Wang and Beno{\^{\i}}t Sagot and Niklas Muennighoff and Albert Villanova del Moral and Olatunji Ruwase and Rachel Bawden and Stas Bekman and Angelina McMillan{-}Major and Iz Beltagy and Huu Nguyen and Lucile Saulnier and Samson Tan and Pedro Ortiz Suarez and Victor Sanh and Hugo Lauren{\c{c}}on and Yacine Jernite and Julien Launay and Margaret Mitchell and Colin Raffel and Aaron Gokaslan and Adi Simhi and Aitor Soroa and Alham Fikri Aji and Amit Alfassy and Anna Rogers and Ariel Kreisberg Nitzav and Canwen Xu and Chenghao Mou and Chris Emezue and Christopher Klamm and Colin Leong and Daniel van Strien and David Ifeoluwa Adelani and et al.}, title = {{BLOOM:} {A} 176B-Parameter Open-Access Multilingual Language Model}, journal = {CoRR}, volume = {abs/2211.05100}, year = {2022}, url = {https://doi.org/10.48550/arXiv.2211.05100}, doi = {10.48550/arXiv.2211.05100}, eprinttype = {arXiv}, eprint = {2211.05100}, timestamp = {Tue, 15 Nov 2022 15:45:12 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2211-05100.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
The quality of artificially generated texts has considerably improved with the advent of transformers. The question of using these models to generate learning data for supervised learning tasks naturally arises. In this article, this question is explored under 3 aspects: (i) are artificial data an efficient complement? (ii) can they replace the original data when those are not available or cannot be distributed for confidentiality reasons? (iii) can they improve the explainability of classifiers? Different experiments are carried out on Web-related classification tasks -- namely sentiment analysis on product reviews and Fake News detection -- using artificially generated data by fine-tuned GPT-2 models. The results show that such artificial data can be used in a certain extend but require pre-processing to significantly improve performance. We show that bag-of-word approaches benefit the most from such data augmentation.
VenueIn Proceedings of the 2022 Language Resources and Evaluation Conference, LREC 2022BibTeX@InProceedings{claveau-chaffin-kijak:2022:LREC, author = {Claveau, Vincent and Chaffin, Antoine and Kijak, Ewa}, title = {Generating Artificial Texts as Substitution or Complement of Training Data}, booktitle = {Proceedings of the Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {4260--4269}, abstract = {The quality of artificially generated texts has considerably improved with the advent of transformers. The question of using these models to generate learning data for supervised learning tasks naturally arises, especially when the original language resource cannot be distributed, or when it is small. In this article, this question is explored under 3 aspects: (i) are artificial data an efficient complement? (ii) can they replace the original data when those are not available or cannot be distributed for confidentiality reasons? (iii) can they improve the explainability of classifiers? Different experiments are carried out on classification tasks - namely sentiment analysis on product reviews and Fake News detection - using artificially generated data by fine-tuned GPT-2 models. The results show that such artificial data can be used in a certain extend but require pre-processing to significantly improve performance. We also show that bag-of-words approaches benefit the most from such data augmentation.}, url = {https://aclanthology.org/2022.lrec-1.453} }Links
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models’ pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several datasets, often outperforming models 16× its size. Further, our model attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6× its size. All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.
VenueIn Proceedings of the 2022 International Conference on Learning Representations, ICLR 2022BibTeX@inproceedings{sanh2022multitask, title={Multitask Prompted Training Enables Zero-Shot Task Generalization}, author={Victor Sanh and Albert Webson and Colin Raffel and Stephen Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Teven Le Scao and Stella Biderman and Leo Gao and Thomas Wolf and Alexander M Rush}, booktitle={International Conference on Learning Representations}, year={2022}, url={https://openreview.net/forum?id=9Vrb9D0WI4} }Links