Skip to content

Commit

Permalink
Move downloading of data files for examples into the build scripts an…
Browse files Browse the repository at this point in the history
…d just point the users to where these files are located instead of adding url lib requests to the python examples so we can focus on what is most important to the user
  • Loading branch information
timsaucer committed Nov 24, 2024
1 parent cdfb5a8 commit eba8f6c
Show file tree
Hide file tree
Showing 10 changed files with 26 additions and 52 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ jobs:
set -x
source venv/bin/activate
cd docs
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
make html
- name: Copy & push the generated HTML
Expand Down
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
pokemon.csv
yellow_trip_data.parquet
yellow_tripdata_2021-01.parquet

11 changes: 10 additions & 1 deletion docs/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,17 @@
#

set -e

if [ ! -f pokemon.csv ]; then
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
fi

if [ ! -f yellow_tripdata_2021-01.parquet ]; then
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
fi

rm -rf build 2> /dev/null
rm -rf temp 2> /dev/null
mkdir temp
cp -rf source/* temp/
make SOURCEDIR=`pwd`/temp html
make SOURCEDIR=`pwd`/temp html
28 changes: 7 additions & 21 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,27 +43,13 @@ Example

.. ipython:: python
import datafusion
from datafusion import col
import pyarrow
# create a context
ctx = datafusion.SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]], name="batch_array")
# create a new statement
df = df.select(
col("a") + col("b"),
col("a") - col("b"),
)
df
from datafusion import SessionContext
ctx = SessionContext()
df = ctx.read_csv("pokemon.csv")
df.show()
.. _toc.links:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ source file as described in the :ref:`Introduction <guide>`, the Pokemon data se

.. ipython:: python
from datafusion import SessionContext, functions as F
from datafusion import SessionContext, col, functions as F
ctx = SessionContext()
Expand Down
6 changes: 0 additions & 6 deletions docs/source/user-guide/common-operations/aggregations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,10 @@ to form a single summary value. For performing an aggregation, DataFusion provid

.. ipython:: python
import urllib.request
from datafusion import SessionContext
from datafusion import col, lit
from datafusion import functions as f
urllib.request.urlretrieve(
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv",
"pokemon.csv",
)
ctx = SessionContext()
df = ctx.read_csv("pokemon.csv")
Expand Down
6 changes: 0 additions & 6 deletions docs/source/user-guide/common-operations/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,8 @@ We'll use the pokemon dataset in the following examples.

.. ipython:: python
import urllib.request
from datafusion import SessionContext
urllib.request.urlretrieve(
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv",
"pokemon.csv",
)
ctx = SessionContext()
ctx.register_csv("pokemon", "pokemon.csv")
df = ctx.table("pokemon")
Expand Down
11 changes: 4 additions & 7 deletions docs/source/user-guide/common-operations/select-and-filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,15 @@ Column Selections
Use :py:func:`~datafusion.dataframe.DataFrame.select` for basic column selection.

DataFusion can work with several file types, to start simple we can use a subset of the
`TLC Trip Record Data <https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page>`_
`TLC Trip Record Data <https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page>`_,
which you can download `here <https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet>`_.

.. ipython:: python
import urllib.request
from datafusion import SessionContext
urllib.request.urlretrieve("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet",
"yellow_trip_data.parquet")
from datafusion import SessionContext
ctx = SessionContext()
df = ctx.read_parquet("yellow_trip_data.parquet")
df = ctx.read_parquet("yellow_tripdata_2021-01.parquet")
df.select("trip_distance", "passenger_count")
For mathematical or logical operations use :py:func:`~datafusion.col` to select columns, and give meaningful names to the resulting
Expand Down
6 changes: 0 additions & 6 deletions docs/source/user-guide/common-operations/windows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,10 @@ We'll use the pokemon dataset (from Ritchie Vink) in the following examples.

.. ipython:: python
import urllib.request
from datafusion import SessionContext
from datafusion import col
from datafusion import functions as f
urllib.request.urlretrieve(
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv",
"pokemon.csv",
)
ctx = SessionContext()
df = ctx.read_csv("pokemon.csv")
Expand Down
4 changes: 0 additions & 4 deletions docs/source/user-guide/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,6 @@ options for data sources. For our first example, we demonstrate using a Pokemon
can download
`here <https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv>`_.

.. code-block:: shell
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
With that file in place you can use the following python example to view the DataFrame in
DataFusion.

Expand Down

0 comments on commit eba8f6c

Please sign in to comment.