Skip to content

Latest commit

 

History

History
249 lines (183 loc) · 14.5 KB

File metadata and controls

249 lines (183 loc) · 14.5 KB

dbt™ Data Modeling Challenge - Social Media Edition

🏁 Competition Status: Completed

The dbt™ Data Modeling Challenge - Social Media Edition has concluded. We extend our heartfelt gratitude to all participants for their outstanding contributions and innovative insights into the world of social media data.

🏆 View All Submissions

Explore the remarkable work of our participants:

We encourage you to review these submissions for inspiration and to learn from the diverse approaches taken by our participants.


📋 Table of Contents

  1. Getting Started
  2. Competition Details
  3. Building Your Project
  4. Submission Guidelines
  5. Submission Template

🚀 Getting Started

1. Registration and Verification

2. Account Setup

After verification, you'll receive two confirmation emails from Paradime:

  • Your Credentials for the dbt™ Data Modeling Challenge - Social Media Edition
  • [Paradime] Activate your account

Follow the instructions in these emails to set up your free accounts for:

3. Support and FAQs

🏆 Competition Details

Before starting your project, familiarize yourself with the following key information:

🛠 Building Your Project

Deadline: September 9, 2024, at 11:59 PM PT

Step 1: Master the Required Tools

To excel in this challenge, familiarize yourself with these essential tools:

Paradime

Paradime is required for SQL and dbt™ development. Other Paradime features are optional.

Learning Resources:

MotherDuck

MotherDuck is required for data storage and compute. Other MotherDuck & DuckDB features are optional.

Learning Resources:

Hex

Hex is required for data visualizations and additional analysis. Other Hex features are optional.

Learning Resources:

Step 2: Bringing in New Data

You can bring in any data you want as long as it's user-generated social media data or relevant data to supplement the user-generated social media data.

How to bring in New Data

  • Your MotherDuck account includes a sample social media dataset, hacker_news, which contains posts and comments.
  • Your Paradime account links to this GitHub repository, with a pre-configured dbt™ model, stg_hacker_news.sql, which references the hackernews table in MotherDuck.

Important: These resources are provided merely as a convenience. You are not required to use this in your project. In fact, to excel in this challenge, you must bring in data on your own.

How to bring new data into MotherDuck

  1. Query data directly from your local machine or an object storage service (AWS S3, Azure Blob Storage, Google Cloud Storage).
  2. Query data directly from Hugging Face, which has countless social media datasets at your disposal.

Step 3: Generate Insights

Use Paradime, MotherDuck, and Hex to uncover compelling insights from social media data. Aim for accurate, relevant, and engaging discoveries.

Need a spark of inspiration?

Check out these resources:

Potential Insight Ideas

Your primary goal is to use Paradime, MotherDuck, and Hex to unearth compelling insights from social media data. With so many social media platforms, chat forums, and supplementary datasets, the possibilities for discovery are virtually limitless. Aim to generate accurate, relevant, scroll-stopping insights. Here are some ideas:

  • COVID-19 Sentiment Analysis

    • Analysis Question: How has the sentiment around COVID-19 on Reddit changed over time? Why?
    • Required Social Media Data: Reddit posts and comments related to COVID-19, or similar dataset.
    • Optional/Supplementary Data: Key dates, news, events, and/or anything that points to why sentiment has changed over time.
  • Donald Trump Popularity Trends

    • Analysis Question: How has Donald Trump's popularity changed over time?
    • Required Social Media Data: A sample of Twitter posts, mentions, and engagement, containing the words "Donald Trump" over the last 10 years.
    • Optional/Supplementary Data: Key dates, news, events, and/or anything that points to why popularity has changed over time.
  • Top YouTube Creators Study

    • Analysis Question: Who are the biggest YouTube creators, and why?
    • Required Social Media Data: YouTube comments, engagement metrics, etc.
    • Optional/Supplementary Data: Trending YouTube Video statistics, or similar datasets.
  • 2022 NFL Superbowl Commercial Impact

    • Analysis Question: Which Commercials were most popular during the 2022 NFL Superbowl?
    • Required Social Media Data: Twitter and/or Reddit posts, mentions, and engagement during the 4-hour time block of the NFL Superbowl. Only pull data that contains information about brands that had Superbowl commercials.
    • Optional/Supplementary Data:
      • For public companies that advertised, pull stock market data to see if there's any correlation between Superbowl commercial success and stock price.
      • Using Superbowl advertisement cost data, identify which brands had the highest social engagement per dollar spent.
  • Hacker News Trend Analysis

    • Analysis Question: What are the most discussed topics and popular websites on Hacker News in 2022?
    • Required Social Media Data: Hacker News dataset sample (January 2022 to November 2022).
    • Optional/Supplementary Data: Tech industry news and events, stock market data for frequently mentioned tech companies.

Step 4: Create Data Visualizations

Use Hex to build impactful visualizations that complement your insights.

📤 Submission Guidelines

Deadline: September 9, 2024, at 11:59 PM PT

Follow this step-by-step tutorial to submit your project:

  1. Email your submission to Parker Rogers ([email protected])
  2. Subject: "<first_and_last_name> - dbt Data Modeling Challenge - Social Media Edition"
  3. Include:
    • GitHub branch link with your dbt™ models
    • README.md file (use the template below)

📝 Submission Template

Use this template as a starting point for your submission. Feel free to customize it to best showcase your project:

# Social Media Data Analysis - dbt™ Modeling Challenge

## Table of Contents
1. [Introduction](#introduction)
2. [Data Sources](#data-sources)
3. [Methodology](#methodology)
4. [Insights](#insights)
5. [Conclusions](#conclusions)

## Introduction
[Brief project overview and goals]

## Data Sources
- Dataset 1: [Name] - [Description]
- Dataset 2: [Name] - [Description]
- [Add more as needed]

### Data Lineage
[Insert data lineage image]

## Methodology
### Tools Used
- Paradime: SQL and dbt™ development
- MotherDuck: Data storage and computing
- Hex: Data visualization
- [Other tools]

### Applied Techniques
- [List key techniques and practices used]

## Insights

### Insight 1
- Title
- Visualization
- Analysis

[Repeat for additional insights]

## Conclusions
[Summarize key findings and their implications]