Skip to content
VRAJ PATEL

Experience

My professional background in software engineering, systems automation, and machine learning research.

Research Assistant (Data Engineering & ML)

March 2023 – Present

Princeton University – Studio Lab

  • Political Data Pipeline (ETL): Engineered a cloud-hybrid pipeline to ingest 40+ GB of election speeches (Modi & Gandhi, 2014/2019) and archive 40+ years (1981–2024) of Lok Sabha debates, creating the largest unified dataset for Indian political linguistic analysis
  • ML & Cloud Optimization: Architected an automated AWS workflow (S3, Transcribe, Translate) using Boto3, implementing Custom Language Models (CLM) to recognize niche political entities and reduce WER
  • Web Scraping System: Developed a resumable Selenium crawler with SQLite state management to index the Parliament Digital Library, implementing logic to handle dynamic pagination and sync with OneDrive storage
  • Unstructured Data Parsing: Designed a text extraction engine using PyMuPDF and FuzzyWuzzy (string matching) to structure thousands of raw PDF statements, mapping OCR text to standardized Ministry entities

Systems Integration Engineer (Automation & Internal Tools)

May 2025 – Present

CU Boulder Institute of Behavioral Science

  • Developing a full-stack ticket classification system using FastAPI and PostgreSQL, architecting a microservices based solution to automate incident tagging via a fine-tuned BERT transformer model
  • Developed an interactive analytics dashboard using React (Vite) and Recharts, utilizing complex state management to visualize historical data and identify critical operational trends, such as pinpointing peak ticket volume (Tuesdays at 10 AM) to proactively optimize staffing schedules
  • Engineered a PowerShell automation tool to recursively scan IBS OU under Colorado AD and purge group memberships, reducing per user offboarding time by 93% (15 mins to <1 min)
  • Enforced Secure Compute compliance standards across 50+ endpoints by implementing Windows Autopilot and Jamf Pro enrollment workflows

Undergraduate Research Assistant - Satellite Telemetry Data Analysis

August 2024 – May 2025

The Data Mine – Purdue University @ L3Harris

  • Developed machine learning models to detect and identify cyberattack-based anomalies using satellite telemetry data from the NASA Simulator for Small Satellites (NOS3)
  • Analyzed and modeled potential space-cyber threats using the SPARTA Matrix and MITRE ATT&CK frameworks, creating synthetic telemetry data to simulate various cyberattack scenarios
  • Presented research findings at the Data Mine of the Rockies Symposium to stakeholders from the US Space Force, Lockheed Martin, CrowdStrike, and L3Harris

Lead Technical Research Assistant

March 2023 – May 2025

University of Colorado Boulder - Institute of Behavioral Science

  • Computer Vision Pipeline (PyTorch/ResNet50): Collaborated with PhD researchers in weekly Agile sprints to architect a visual bias analysis pipeline for NYT COVID-19 imagery. Engineered a custom preprocessing workflow using OpenCV and Scikit-image to perform Z-score normalization and channel-wise intensity rescaling
  • Unsupervised Learning: Leveraged Transfer Learning by deploying a truncated ResNet50 model to extract high dimensional feature embeddings, which were fed into K-Means clustering algorithms to uncover latent patterns in media datasets without reliance on labeled data
  • ETL Architecture: Engineered a resilient Data Engineering pipeline to aggregate 15+ years of legislative data. Built a custom Selenium and BeautifulSoup scraper to navigate dynamic DOM elements, implementing JSON checkpointing to ensure data integrity during long running jobs
  • API Optimization: Developed a Python wrapper for the LegiScan API with in-memory caching and rate-limit handling, reducing redundant network requests by 40% during bulk data ingestion