The dbt™ Data Modeling Challenge - Social Media Edition has concluded. We extend our heartfelt gratitude to all participants for their outstanding contributions and innovative insights into the world of social media data.
Explore the remarkable work of our participants:
- 🥇 Bruno Souza De Lima
- 🥈 Jayeson Gao
- 🥉 Mads Spanggaard
- 🥉 Hetvi Parekh
- Alexandra Vajda
- Andy Malo
- Cyril Motte
- Demik Freitas
- Ilse Tse
- Isin Pesch
- Nancy Amandi
- Raj Kumar
- Rasmus Sorensen
- Sophie Li
- Ufuk Ceyhanli
- Waldemar Hujo
We encourage you to review these submissions for inspiration and to learn from the diverse approaches taken by our participants.
- Submit Your Application: Fill out the registration form
- Verification: We'll review your application against the entry requirements
After verification, you'll receive two confirmation emails from Paradime:
- Your Credentials for the dbt™ Data Modeling Challenge - Social Media Edition
- [Paradime] Activate your account
Follow the instructions in these emails to set up your free accounts for:
- Technical Support: Join Paradime's #social-media-data-challenge Slack channel.
- Additional Support: Check out the MotherDuck Slack Community.
- Troubleshooting Confirmation Emails:
- Ensure you meet the entry requirements
- Search for "[email protected]" in your registration email account.
- If using a personal email to register, check LinkedIn for DMs from Parker Rogers (I ask for business email if applicable).
- Still no luck? DM Parker Rogers via Paradime's Challenge Slack.
Before starting your project, familiarize yourself with the following key information:
Deadline: September 9, 2024, at 11:59 PM PT
To excel in this challenge, familiarize yourself with these essential tools:
Paradime is required for SQL and dbt™ development. Other Paradime features are optional.
- Code IDE Tutorial: Navigate the code IDE and master basic features.
- Commands Panel Tutorial: Learn valuable Paradime features:
- DinoAI Copilot Tutorial: Enhance your SQL and dbt™ development with DinoAI.
- Paradime Documentation: Comprehensive product documentation for additional learning.
MotherDuck is required for data storage and compute. Other MotherDuck & DuckDB features are optional.
- Getting started tutorial with Motherduck & DuckDB
- Working with dbt and MotherDuck: understand how to configure your dbt project and more!
- How to connect MotherDuck and Hex
- MotherDuck documentation website
Hex is required for data visualizations and additional analysis. Other Hex features are optional.
- Getting started with Hex
- Writing SQL in Hex
- Hex Use Case Gallery for inspiration and examples
- Hex Foundations YouTube course
You can bring in any data you want as long as it's user-generated social media data or relevant data to supplement the user-generated social media data.
- Your MotherDuck account includes a sample social media dataset, hacker_news, which contains posts and comments.
- Your Paradime account links to this GitHub repository, with a pre-configured dbt™ model, stg_hacker_news.sql, which references the hackernews table in MotherDuck.
Important: These resources are provided merely as a convenience. You are not required to use this in your project. In fact, to excel in this challenge, you must bring in data on your own.
- Query data directly from your local machine or an object storage service (AWS S3, Azure Blob Storage, Google Cloud Storage).
- Query data directly from Hugging Face, which has countless social media datasets at your disposal.
Use Paradime, MotherDuck, and Hex to uncover compelling insights from social media data. Aim for accurate, relevant, and engaging discoveries.
Check out these resources:
- Winning Strategies for Paradime's Movie Data Modeling Challenge: Learn the strategies, best practices, and insights uncovered from winning participants in previous Data Modeling Challenges.
- Explore winning submissions from Paradime's recent Data Modeling Challenges:
- Nikita Volynets' Submission - 2nd Place winner from Paradime's dbt Data Modeling Challenge - NBA Edition.
- Spence Perry's Submission - 1st place winner from Paradime's dbt Data Modeling Challenge - NBA Edition.
- Isin Pesch's Submission - 1st place winner from Paradime's dbt Data Modeling Challenge - Movie Edition.
Your primary goal is to use Paradime, MotherDuck, and Hex to unearth compelling insights from social media data. With so many social media platforms, chat forums, and supplementary datasets, the possibilities for discovery are virtually limitless. Aim to generate accurate, relevant, scroll-stopping insights. Here are some ideas:
-
COVID-19 Sentiment Analysis
- Analysis Question: How has the sentiment around COVID-19 on Reddit changed over time? Why?
- Required Social Media Data: Reddit posts and comments related to COVID-19, or similar dataset.
- Optional/Supplementary Data: Key dates, news, events, and/or anything that points to why sentiment has changed over time.
-
Donald Trump Popularity Trends
- Analysis Question: How has Donald Trump's popularity changed over time?
- Required Social Media Data: A sample of Twitter posts, mentions, and engagement, containing the words "Donald Trump" over the last 10 years.
- Optional/Supplementary Data: Key dates, news, events, and/or anything that points to why popularity has changed over time.
-
Top YouTube Creators Study
- Analysis Question: Who are the biggest YouTube creators, and why?
- Required Social Media Data: YouTube comments, engagement metrics, etc.
- Optional/Supplementary Data: Trending YouTube Video statistics, or similar datasets.
-
2022 NFL Superbowl Commercial Impact
- Analysis Question: Which Commercials were most popular during the 2022 NFL Superbowl?
- Required Social Media Data: Twitter and/or Reddit posts, mentions, and engagement during the 4-hour time block of the NFL Superbowl. Only pull data that contains information about brands that had Superbowl commercials.
- Optional/Supplementary Data:
- For public companies that advertised, pull stock market data to see if there's any correlation between Superbowl commercial success and stock price.
- Using Superbowl advertisement cost data, identify which brands had the highest social engagement per dollar spent.
-
Hacker News Trend Analysis
- Analysis Question: What are the most discussed topics and popular websites on Hacker News in 2022?
- Required Social Media Data: Hacker News dataset sample (January 2022 to November 2022).
- Optional/Supplementary Data: Tech industry news and events, stock market data for frequently mentioned tech companies.
Use Hex to build impactful visualizations that complement your insights.
Deadline: September 9, 2024, at 11:59 PM PT
Follow this step-by-step tutorial to submit your project:
- Email your submission to Parker Rogers ([email protected])
- Subject: "<first_and_last_name> - dbt Data Modeling Challenge - Social Media Edition"
- Include:
- GitHub branch link with your dbt™ models
- README.md file (use the template below)
Use this template as a starting point for your submission. Feel free to customize it to best showcase your project:
# Social Media Data Analysis - dbt™ Modeling Challenge
## Table of Contents
1. [Introduction](#introduction)
2. [Data Sources](#data-sources)
3. [Methodology](#methodology)
4. [Insights](#insights)
5. [Conclusions](#conclusions)
## Introduction
[Brief project overview and goals]
## Data Sources
- Dataset 1: [Name] - [Description]
- Dataset 2: [Name] - [Description]
- [Add more as needed]
### Data Lineage
[Insert data lineage image]
## Methodology
### Tools Used
- Paradime: SQL and dbt™ development
- MotherDuck: Data storage and computing
- Hex: Data visualization
- [Other tools]
### Applied Techniques
- [List key techniques and practices used]
## Insights
### Insight 1
- Title
- Visualization
- Analysis
[Repeat for additional insights]
## Conclusions
[Summarize key findings and their implications]