I’m a Data Engineer with the California Office of Digital Innovation, building a data platform for understanding Californians and their needs. My current focus is on developing data pipelines, models, and analysis to examine mobility patterns and the economic impact of Covid-19 Stay at Home policies.
I also have projects that visualize the supply of short-term home rentals and recommend fanfiction. Both of these involve ETL and warehousing large amounts of public data. I’ve previously created pipelines via scraping and APIs for a recruiting firm that allowed for highly individualized executive searches. Before that, I worked as a software engineer and data analyst in the advertising industry.
The global short-term rentals project is a full data pipeline and warehouse. A dashboard allows for the exploration of the impact of short-term rental listings (Airbnb) on housing. Data is pulled from three separate public datasets and consists of over 35 million records from 2015-2020. The tools used include Python, Snowflake, DBT, and Metabase.
DBT was used for the transformation layer. DBT allows for automated analytics tools like:
A video demonstration is available.
pip install -r requirements.txt.
set -o allexport; source .env; set +o allexportto export credentials and other environment variables. You’ll need to make adjustments on a Windows machine.
src/create_warehouse.sqlas a guide.
python src/extract.py. The script may take a few hours.
python src/load.py. Again, depending on your connection, will take awhile.
dbt docs generate, and
dbt docs servefor locally hosted documentation.
visualizations/, adjusting the visualization component.
A personal project by Rebecca Sanjabi.