Research
I'm interested in building trustworthy large language models that are capable of reasoning robustly. I am particularly excited about teaching models to express their uncertainty, reason in consistent ways, perform long-horizon planning, and continually adapt to real-world signals.
|
|
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
arXiv, 2023
arXiv
We study if the uncertainty of black-box models can be approximated through open, white-box models and find that confidence scores can transfer from weaker, white-box models like Llama 2 to stronger, black-box models like GPT-4. We discover that different LLMs tend to make similar mistakes, potentially enabling transfer of their ingrained uncertainty.
|
|
Benchmarking and Improving Generator-Validator Consistency of Language Models
Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang
ICLR, 2024
arXiv
We find that models tend to be inconsistent when generating an answer (responding with '15' to 'What is 7+8?') vs when validating their answers (saying 'No' to 'Is 7+8=15?') - a behavior we term 'Generator-Validator (GV) inconsistency'. We introduce consistency fine-tuning, a self-supervised approach to reduce GV inconsistencies in language models.
|
|
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot
ICLR, 2024
arXiv
We show that LLMs provide biased simulations of human behavior and exhibit stereotypical and harmful reasoning patterns while adopting personas of different socio-demographic groups. For instance, ChatGPT frequently makes limiting and incorrect assumptions about a physically-disabled person ("As a physically-disabled person, I can't move and thus I am not good at math.").
|
|
UserIdentifier: Implicit User Representations for Simple and Effective Personalized Sentiment Analysis
Fatemehsadat Mireshghallah, Vaishnavi Shrivastava, Milad Shokouhi, Taylor Berg-Kirkpatrick, Robert Sim, Dimitrios Dimitriadis
NAACL, 2022
arXiv
We investigate using non-trainable, user-specific prompts for user-personalization, instead of trainable embeddings. We demonstrate that we can outperform SOTA prefix-tuning based results on a suite of sentiment analysis by up to 13%.
|
|
Exploring Low-Cost Transformer Model Compression for Large-Scale
Commercial Reply Suggestions
Vaishnavi Shrivastava, Radhika Gaonkar, Shashank Gupta, Abhishek Jha
arXiv, 2021
arXiv
We study low-cost methods to compress Transformer bi-encoder based reply suggestion system, reducing training and inference times by 42% and 35% respectively. We investigate how dataset size, pre-trained model use, and domain adaptation of the pre-trained model affected the performance of compression techniques.
|
Website source code from here.
|
|