Data
Data pipelines, scraping, ETL, and large-scale data systems
Data Engineer @ Open Project @ Berkeley
Aug 2024 — Jan 2025- Built an async web scraper with 10,000+ automated HTTP requests per run, collecting 100K+ data points
- Reduced end-to-end collection time by ~100x and structured 600+ classroom schedules into JSON availability models
- Cleaned and normalized 500K+ user-submitted ingredient entries, increasing recognized inputs from <20% to >95%
PythonPlaywrightPandas
Data Scientist @ Synopsys (Contract)
Aug 2025 — Dec 2025- Transformed raw chip netlist data into graph representations for GNN training across ~50–100 designs
- Built data pipelines with PyTorch Geometric to construct and featurize netlist graphs with attention layers
- Benchmarked inference speed and model scalability across varying design complexities
PythonPyTorch GeometricPandasETL
Projects
Disaster Response NLP
Processed ~45K disaster tweets across 14 events. Developed geospatial and textual models to identify locations of highest need, using word frequency analysis and sentiment scoring to extract actionable signals.
PythonPandasGeospatial
Disinformation Detection System
Collected and cleaned 1,000+ text samples via web scraping. Reduced preprocessing time by 40% with structured NLP pipelines for feature extraction.
PythonWeb ScrapingNLP