Natural Language Processing

This project explores the potential of Large Language Models (LLMs) to accurately simulate user behavior in Reddit communities. We investigate if LLMs can effectively mimic the communication patterns of specific users when provided with their comment history as context, focusing on the r/science subreddit. Authors: Vedaant Jain*, Yoshee Jain∗, Ishq Gupta, Aditi Shrivastava, Koustuv Saha, Eshwar Chandrasekharan

May 10, 2024

Curriculum Learning for Embodied Planning with LLMs

This project explores the application of Curriculum Learning to improve the performance of GPT-2 models in Embodied Natural Language Processing tasks using the ALFWorld dataset. We developed curricula for both Action Modeling and Reinforcement Learning stages, demonstrating significant improvements in task success rates and action efficiency. Authors: Bohan Liu, Vedaant Jain, Aarohi Gupta

May 10, 2024

Multi-Modal Information Extraction from Academic Resumes

This project addresses the challenge of extracting structured information from academic resumes, which often span multiple pages and contain complex, domain-specific content. We developed a novel approach combining document layout analysis and sequence tagging to accurately segment and extract key information from various resume sections.

May 10, 2023