Jaisidh Singh

I am a final-year undergraduate student at IIT Jodhpur majoring in AI and Data Science.

I work on a wide range of machine learning and deep learning problems. Specifically, I am most interested in robust vision-language reasoning and understanding through handcrafted algorithmic improvements as well as scaling up model capacity. Besides this, my research and interests also touch upon interpretability and responsible AI.

Email  /  CV  /  Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo

Publications


Learn "No" To Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh, Ishaan Shrivastava, Mayank Vatsa, Richa Singh, Aparna Bharati
arXiv Preprint / GitHub

VLMs like CLIP often ignore the effect of negation words such as "no", "not", etc. To solve this issue, we propose novel modifications to the CLIP loss using our high-quality dataset containing negations towards significant gains in image classification and general-purpose compositionality.


SynthProv: Interpretable Framework for Profiling Identity Leakage
Jaisidh Singh, Harshil Bhatia, Mayank Vatsa, Richa Singh, Aparna Bharati
WACV, 2024
Paper / Poster / Slides

GANs remember identities seen during training and leak those features across their representations (identity leakage). Our framework makes it possible to detect and trace identity leakage across synthetic images and the GAN's latent space.


IdProv: Identity-Based Provenance for Synthetic Image Generation (Student Abstract)
Harshil Bhatia*, Jaisidh Singh*, Aparna Bharati, Richa Singh, Mayank Vatsa
AAAI, 2023
Paper / Poster

Highlights the motivation for identity leakage analysis and the threat it poses to privacy.

Experience



Research AssociateJan 2024 to Present
CERC-AAI Labs, MILA Montreal, Canada
With Diganta Mishra, Dr. Irina Rish
Project Domain: Continual learning, Mixture-of-Experts, Transformers

Undergraduate ResearcherJuly 2023 - March 2024
Trusted AI Lab, IIT Jodhpur Rajasthan, India
With Dr. Richa Singh, Dr. Mayank Vatsa, and Dr. Aparna Bharati
Project Domain: Vision-Language models, Compositionality, Contrastive learning

Computer Vision Research EngineerMay 2023 - Jan 2024
Bosch Research India Madison, WI
With Dr. Amit Arvind Kale, Sonam Singh.
Project Domain: Systematic errors, Semantic segmentation, Zero-shot foundation models.

Undergraduate ResearcherMay 2022 - Jan 2023
Trusted AI Lab, IIT Jodhpur Rajasthan, India
With Dr. Richa Singh, Dr. Mayank Vatsa, and Dr. Aparna Bharati
Project Domain: Face-GANs, Latent space semantic encoding, identity leakage, privacy.

Computer Vision Research EngineerMay 2022 - July 2022
Bosch Research India Madison, WI
With Sonam Singh.
Project Domain: Multimodal image retrieval, Vision transformers, Prompting for Vision-Language models.

Projects

Some of my self-projects can be found here.


Latent Diffusion Inpainting
Code

A modular, plug-and-play organization of my tinkering with diffusion. Flexible training and inference for DDPMs & LDMs on Stanford Cars, LSUN Dining Rooms, and Stable Diffusion inference. Adds grounded inpainting + generation Stable Diffusion and GLIGEN.


UV-Summ: Transformer U-Net for Video Summarization
Code

An architecture inspired by UNet for scoring CLIP frame features to extract key-frames. Includes a zero-shot text summarizer using CLIP embeddings of the extracted key-frames. Our results outperformed recent approaches on the TvSumm datasets, and this was presented as the course project for Deep Learning 2023 @ IIT Jodhpur


Beyond Token Limits: Inference-time Optimization for Large Document Summarization
Code / Deployed

A high-utility project for the summarization of large articles, in a purely inference-based, plug-and-play manner. Utilizing hierarchical sentence clustering for extractive summarization, this was presented as the DL-Ops project for DL 2023 @ IIT Jodhpur.


LowResFormer: A Transformer for Low Resolution Fine Grained Image Classification
Code

A vision transformer architecture developed as my first self-research endeavour. Utilized multimodal inputs of images along with their attribute information, to classify images. Outperformed previous approached on the AwA2 dataset, in all resolutions.


Deep RL: Agents on Gym and Custom Environments
Code

A project of self-implemented RL algorithms on OpenAI's Gym to gain an understanding of this field. Implemented the DQN and Permutation Invariant Senory Transformer papers, and used this learning to apply DQN for the Course Project for Advanced Artificial Intelligence 2023 @ IIT Jodhpur.

Extras

I am an avid reader, and enjoy fiction and non-fiction, especially mythology and philosophy. I am also an experienced guitarist and vocalist. I have performed at large-scale events like IGNUS and Inter-IIT Cultural Meet. As a Core Member and Mentor of Sangam Music Society @ IIT Jodhpur I have guided 3 batches of musicians of the society. Personally, I am drawn to collaborations and leadership roles, and positions which allow me to help people. Being a Student Guide @ SWC IITJ (2021) allowed me to closely mentor 10 students, and handle a junior batch of 500 students with a team of 45. My varied interests and skills also made me a Core Member (2021) of Quiz Society, Literature Society, and DevlUp Labs @ IITJ.


Website template taken from Jon Barron's github repository.