CineETL: Movie Insights Data Pipeline
- Tech Stack: AWS Redshift, Airflow, PySpark, AWS Glue, Amazon S3, Docker, Python
- Github URL: Project Link
CineETL is a robust data pipeline designed to extract, transform, and load movie-related data, providing comprehensive insights into the film industry's dynamics.
Crafted ETL pipeline for 26M user ratings and 45K movies with a data ingestion rate of 10K records/minutes into AWS Redshift.
Normalized data model, automated data quality checks, orchestrated using Airflow and achieved 99% daily ETL cycle success rate.