Skip to content

TickerTick stock news dataset 2023-11-23

Latest
Compare
Choose a tag to compare
@hczhu hczhu released this 27 Nov 02:59
· 9 commits to master since this release
763923c

Use the following link to download the dataset: Dataset Download Link
The dataset has close to 8 million news stories. The dataset file has each stock news story as a line in JSON format in reverse chronological order. An example news story in prettified multi-line JSON format is shown below:

{
  "title": "Europe gives Meta, TikTok six days to share information on response to Israel-Hamas conflict",
  "url": "https://www.cnbc.com/2023/10/19/israel-hamas-eu-gives-meta-tiktok-six-days-to-provide-information.html",
  "unix_timestamp": 1697727889,
  "id": "3341850707742811898",
  "tickers_direct": [
    "meta",
    "fb"
  ],
  "tickers_indirect": [
    ".bytedance"
  ],
  "description": "The EU said it would like Meta and TikTok to hand over information on how they're tackling misinformation about the Israel-Hamas war."
}

The fields of the JSON blob are explained below. Most of the fields have the same semantics as the ones in the response of TickerTick API.

Field name Meaning Optional field?
(If yes, this field can be missing)
title The title of this news story No
url The original URL for the full news story No
unix_timestamp The UNIX timestamp when the news was reported No
id A unique string ID of this news story No
description A short description of this news story Yes
tickers_direct The tickers that the news story is directly about, e.g., the name of the company for the ticker is mentioned Yes
tickers_indirect The tickers that the news story is indirectly about, e.g., the CEO or a product of the company for this ticker is mentioned Yes

Note that many well-known pre-IPO startups (e.g., Bytedance, the parent company of TikTok) have made-up tickers like .bytedance and .databricks.