← Back to experience

Data

Data pipelines, scraping, ETL, and large-scale data systems

Data Engineer @ Open Project @ Berkeley

Aug 2024 — Jan 2025
  • Built an async web scraper with 10,000+ automated HTTP requests per run, collecting 100K+ data points
  • Reduced end-to-end collection time by ~100x and structured 600+ classroom schedules into JSON availability models
  • Cleaned and normalized 500K+ user-submitted ingredient entries, increasing recognized inputs from <20% to >95%
PythonPlaywrightPandas

Data Scientist @ Synopsys (Contract)

Aug 2025 — Dec 2025
  • Transformed raw chip netlist data into graph representations for GNN training across ~50–100 designs
  • Built data pipelines with PyTorch Geometric to construct and featurize netlist graphs with attention layers
  • Benchmarked inference speed and model scalability across varying design complexities
PythonPyTorch GeometricPandasETL

Projects

Disaster Response NLP

Processed ~45K disaster tweets across 14 events. Developed geospatial and textual models to identify locations of highest need, using word frequency analysis and sentiment scoring to extract actionable signals.

PythonPandasGeospatial

Disinformation Detection System

Collected and cleaned 1,000+ text samples via web scraping. Reduced preprocessing time by 40% with structured NLP pipelines for feature extraction.

PythonWeb ScrapingNLP