Analytics Engineer
As a data engineer at the California Office of Digital Innovation, I’m working on a data platform for understanding Californians and their needs. My current focus is on developing data pipelines, models, and analyses from surveys and web analytics to inform a holistic view around sentiment.
I have a project that visualizes the supply of short-term home rentals, which involved creating an ELT pipeline, data warehouse, and hosted BI server for displaying data of publically available datasets. I’ve previously created pipelines via scraping and APIs for a recruiting firm that allowed for highly individualized executive searches. Before that, I worked as a software engineer and data analyst in the advertising industry.
Short-Term Rental Data Warehouse
The global short-term rentals project is a full data pipeline and warehouse. A dashboard allows for the exploration of the impact of short-term rental listings (Airbnb) on housing. Data is pulled from three separate public datasets and consists of over 35 million records from 2015-2020. The tools used include Python, Snowflake, DBT, and Metabase.
DBT was used for the transformation layer. DBT allows for automated analytics tools like:
A video demonstration is available.
Screen shots:
pip install -r requirements.txt
.set -o allexport; source .env; set +o allexport
to export credentials and other environment variables. You’ll need to make adjustments on a Windows machine.src/create_warehouse.sql
as a guide.python src/extract.py
. The script may take a few hours.python src/load.py
. Again, depending on your connection, will take awhile.dbt deps
.dbt run
.dbt test
.dbt docs generate
, and dbt docs serve
for locally hosted documentation.visualizations/
, adjusting the visualization component.A personal project by Rebecca Sanjabi.