<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Information Extraction | Vedaant Jain</title><link>https://vedaantjain.netlify.app/tags/information-extraction/</link><atom:link href="https://vedaantjain.netlify.app/tags/information-extraction/index.xml" rel="self" type="application/rss+xml"/><description>Information Extraction</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 10 May 2023 00:00:00 +0000</lastBuildDate><image><url>https://vedaantjain.netlify.app/media/icon_hu68170e94a17a2a43d6dcb45cf0e8e589_3079_512x512_fill_lanczos_center_3.png</url><title>Information Extraction</title><link>https://vedaantjain.netlify.app/tags/information-extraction/</link></image><item><title>Multi-Modal Information Extraction from Academic Resumes</title><link>https://vedaantjain.netlify.app/project/resumeextraction/</link><pubDate>Wed, 10 May 2023 00:00:00 +0000</pubDate><guid>https://vedaantjain.netlify.app/project/resumeextraction/</guid><description>&lt;p>This project addresses the challenge of extracting structured information from academic resumes, which often span multiple pages and contain complex, domain-specific content. We developed a novel approach combining document layout analysis and sequence tagging to accurately segment and extract key information from various resume sections.&lt;/p>
&lt;p>Key aspects of this research include:&lt;/p>
&lt;ul>
&lt;li>Utilizing Document-Image-Transformer (DiT) for title detection and resume sectioning&lt;/li>
&lt;li>Implementing BERT-based sequence tagging models for information extraction from specific sections (education, employment, publications)&lt;/li>
&lt;li>Creating a labeled dataset of 30+ academic resumes (250+ pages) for model training and evaluation&lt;/li>
&lt;/ul></description></item></channel></rss>