Todd Birchard

Senior Data Engineer with Product Management background. Does everything incorrectly before coming to realizations known as best practices.

Manage Files in Google Cloud Storage With Python

Manage files in your Google Cloud Storage bucket using the google-cloud-storage Python library.

Working with PySpark RDDs

Working with Spark’s original data structure API: Resilient Distributed Datasets.

Manage Data Pipelines with Apache Airflow

Use Apache Airflow to build and monitor better data pipelines.

Using Hierarchical Indexes With Pandas

Use Panda’s Multiindex to make your data work harder for you.

Managing Flask Session Variables

Using Flask-Session and Flask-Redis to store user session variables.

Reshaping Pandas DataFrames

A guide to DataFrame manipulation using groupby, melt, pivot tables, pivot, transpose, and stack.

Structured Streaming with PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.

DataFrame Transformations in PySpark (Continued)

Continuing to apply transformations to Spark DataFrames using PySpark.

Fixing your NPM installation

Fixing an npm installation gone wrong when ‘sudo’ is misused.

Becoming Familiar with Apache Kafka and Message Queues

An overview of how Kafka works, as well as equivalent message brokers.


