Multi-Modal Information Extraction from Academic Resumes

Wed, 10 May 2023 00:00:00 +0000

This project addresses the challenge of extracting structured information from academic resumes, which often span multiple pages and contain complex, domain-specific content. We developed a novel approach combining document layout analysis and sequence tagging to accurately segment and extract key information from various resume sections.

Key aspects of this research include:

Utilizing Document-Image-Transformer (DiT) for title detection and resume sectioning
Implementing BERT-based sequence tagging models for information extraction from specific sections (education, employment, publications)
Creating a labeled dataset of 30+ academic resumes (250+ pages) for model training and evaluation

Information Extraction | Vedaant Jain

Multi-Modal Information Extraction from Academic Resumes