I’m a Data Engineer with the California Office of Digital Innovation, building a data platform for understanding Californians and their needs. My current focus is on developing data pipelines, models, and analysis to examine mobility patterns and the economic impact of Covid-19 Stay at Home policies.
I also have projects that visualize the supply of short-term home rentals and recommend fanfiction. Both of these involve ETL and warehousing large amounts of public data. I’ve previously created pipelines via scraping and APIs for a recruiting firm that allowed for highly individualized executive searches. Before that, I worked as a software engineer and data analyst in the advertising industry.
The global short-term rentals project is a full data pipeline and warehouse. A dashboard allows for the exploration of the impact of short-term rental listings (Airbnb) on housing. Data is pulled from three separate public datasets and consists of over 35 million records from 2015-2020. The tools used include Python, Snowflake, DBT, and Metabase.
DBT was used for the transformation layer. DBT allows for automated analytics tools like:
A working dashboard is available from a Metabase server hosted on Heroku.
A video demonstration is also available.
pip install -r requirements.txt.
set -o allexport; source .env; set +o allexportto export credentials and other environment variables. You’ll need to make adjustments on a Windows machine.
src/create_warehouse.sqlas a guide.
python src/extract.py. The script may take a few hours.
python src/load.py. Again, depending on your connection, will take awhile.
dbt docs generate, and
dbt docs servefor locally hosted documentation.
visualizations/, adjusting the visualization component.
A personal project by Rebecca Sanjabi.
Fans who want to read fanfiction will invariably find themselves at the massive, community-supported, volunteer-run Archive of Our Own (or AO3 for short). This Hugo award-winning, open-source based website archives over 6 million fanworks, for over 5 million registered users in 35,000 fandoms. As a community-run and community-focused non-profit, it does a stellar job of its primary function, which is saving and collating amateur works. However, beyond essential filtering functions, it provides no methods of recommending works to users.
I wanted to provide an alternate method for fans to find new works and developed a recommender system to do just that at fanrecs.com. AO3 users have the option of leaving kudos on fanworks (fan-created items like fiction, art, videos, or commentary). These kudos are the basis for an item to item, implicit, collaborative filtering based model. Hosted on a web-based microservice, one can enter in the id of a favorite work and get suggestions for other works they might enjoy.
Currently recommendations are only on a subset of works (namely, Star Wars fandom circa May 2020) while I work out a way to scale the recommendations beyond 125k fics. Details on implementation can be found in my GitHub repository