Built a Python crawler on Google Cloud, processing over 10,000,000 academic papers from European universities, extracting text from PDFs/images, and eliminating duplicates; deployed in production for plagiarism detection.
Built an admin panel with React, enabling scheduling of crawls, domain management, status monitoring, and data visualization, streamlining administrative workflows.
Optimized and scaled the system for production use, enabling it to handle 10TB+ of academic data for real-time analysis and plagiarism checks.