Skip to content

Commit 9bb0928

Browse files
Merge branch 'pandas-dev:main' into bug-unhashable-columns
2 parents d45f6dc + 5909621 commit 9bb0928

32 files changed

+972
-627
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
-----------------
77

8-
# pandas: powerful Python data analysis toolkit
8+
# pandas: A Powerful Python Data Analysis Toolkit
99

1010
| | |
1111
| --- | --- |

ci/deps/actions-310-minimum_versions.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ dependencies:
1818
- pytest-xdist>=3.4.0
1919
- pytest-localserver>=0.8.1
2020
- pytest-qt>=4.4.0
21-
- boto3
21+
- boto3=1.37.3
2222

2323
# required dependencies
2424
- python-dateutil=2.8.2

ci/deps/actions-310.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pytest-xdist>=3.4.0
1717
- pytest-localserver>=0.8.1
1818
- pytest-qt>=4.4.0
19-
- boto3
19+
- boto3=1.37.3
2020

2121
# required dependencies
2222
- python-dateutil

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ dependencies:
1717
- pytest-xdist>=3.4.0
1818
- pytest-localserver>=0.8.1
1919
- pytest-qt>=4.4.0
20-
- boto3
20+
- boto3=1.37.3
2121

2222
# required dependencies
2323
- python-dateutil

ci/deps/actions-311.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pytest-xdist>=3.4.0
1717
- pytest-localserver>=0.8.1
1818
- pytest-qt>=4.4.0
19-
- boto3
19+
- boto3=1.37.3
2020

2121
# required dependencies
2222
- python-dateutil

ci/deps/actions-312.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pytest-xdist>=3.4.0
1717
- pytest-localserver>=0.8.1
1818
- pytest-qt>=4.4.0
19-
- boto3
19+
- boto3=1.37.3
2020

2121
# required dependencies
2222
- python-dateutil

ci/deps/actions-313.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pytest-xdist>=3.4.0
1717
- pytest-localserver>=0.8.1
1818
- pytest-qt>=4.4.0
19-
- boto3
19+
- boto3=1.37.3
2020

2121
# required dependencies
2222
- python-dateutil

doc/source/getting_started/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ Dependency Minimum Version pip ex
308308
`zlib <https://github.com/madler/zlib>`__ hdf5 Compression for HDF5
309309
`fastparquet <https://github.com/dask/fastparquet>`__ 2024.2.0 - Parquet reading / writing (pyarrow is default)
310310
`pyarrow <https://github.com/apache/arrow>`__ 10.0.1 parquet, feather Parquet, ORC, and feather reading / writing
311-
`PyIceberg <https://py.iceberg.apache.org/>`__ 0.7.1 iceberg Apache Iceberg reading
311+
`PyIceberg <https://py.iceberg.apache.org/>`__ 0.7.1 iceberg Apache Iceberg reading / writing
312312
`pyreadstat <https://github.com/Roche/pyreadstat>`__ 1.2.6 spss SPSS files (.sav) reading
313313
`odfpy <https://github.com/eea/odfpy>`__ 1.4.1 excel Open document format (.odf, .ods, .odt) reading / writing
314314
====================================================== ================== ================ ==========================================================

doc/source/reference/io.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ Iceberg
162162
:toctree: api/
163163

164164
read_iceberg
165+
DataFrame.to_iceberg
165166

166167
.. warning:: ``read_iceberg`` is experimental and may change without warning.
167168

doc/source/user_guide/io.rst

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2929
binary,`HDF5 Format <https://support.hdfgroup.org/documentation/hdf5/latest/_intro_h_d_f5.html>`__, :ref:`read_hdf<io.hdf5>`, :ref:`to_hdf<io.hdf5>`
3030
binary,`Feather Format <https://github.com/wesm/feather>`__, :ref:`read_feather<io.feather>`, :ref:`to_feather<io.feather>`
3131
binary,`Parquet Format <https://parquet.apache.org/>`__, :ref:`read_parquet<io.parquet>`, :ref:`to_parquet<io.parquet>`
32-
binary,`Apache Iceberg <https://iceberg.apache.org/>`__, :ref:`read_iceberg<io.iceberg>` , NA
32+
binary,`Apache Iceberg <https://iceberg.apache.org/>`__, :ref:`read_iceberg<io.iceberg>` , :ref:`to_iceberg<io.iceberg>`
3333
binary,`ORC Format <https://orc.apache.org/>`__, :ref:`read_orc<io.orc>`, :ref:`to_orc<io.orc>`
3434
binary,`Stata <https://en.wikipedia.org/wiki/Stata>`__, :ref:`read_stata<io.stata_reader>`, :ref:`to_stata<io.stata_writer>`
3535
binary,`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__, :ref:`read_sas<io.sas_reader>` , NA
@@ -5417,7 +5417,7 @@ engines to safely work with the same tables at the same time.
54175417

54185418
Iceberg support predicate pushdown and column pruning, which are available to pandas
54195419
users via the ``row_filter`` and ``selected_fields`` parameters of the :func:`~pandas.read_iceberg`
5420-
function. This is convenient to extract from large tables a subset that fits in memory asa
5420+
function. This is convenient to extract from large tables a subset that fits in memory as a
54215421
pandas ``DataFrame``.
54225422

54235423
Internally, pandas uses PyIceberg_ to query Iceberg.
@@ -5497,6 +5497,29 @@ parameter:
54975497
Reading a particular snapshot is also possible providing the snapshot ID as an argument to
54985498
``snapshot_id``.
54995499

5500+
To save a ``DataFrame`` to Iceberg, it can be done with the :meth:`DataFrame.to_iceberg`
5501+
method:
5502+
5503+
.. code-block:: python
5504+
5505+
df.to_iceberg("my_table", catalog_name="my_catalog")
5506+
5507+
To specify the catalog, it works in the same way as for :func:`read_iceberg` with the
5508+
``catalog_name`` and ``catalog_properties`` parameters.
5509+
5510+
The location of the table can be specified with the ``location`` parameter:
5511+
5512+
.. code-block:: python
5513+
5514+
df.to_iceberg(
5515+
"my_table",
5516+
catalog_name="my_catalog",
5517+
location="s://my-data-lake/my-iceberg-tables",
5518+
)
5519+
5520+
It is possible to add properties to the table snapshot by passing a dictionary to the
5521+
``snapshot_properties`` parameter.
5522+
55005523
More information about the Iceberg format can be found in the `Apache Iceberg official
55015524
page <https://iceberg.apache.org/>`__.
55025525

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ Other enhancements
6464
- :meth:`Series.nlargest` uses a 'stable' sort internally and will preserve original ordering.
6565
- :class:`ArrowDtype` now supports ``pyarrow.JsonType`` (:issue:`60958`)
6666
- :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` methods ``sum``, ``mean``, ``median``, ``prod``, ``min``, ``max``, ``std``, ``var`` and ``sem`` now accept ``skipna`` parameter (:issue:`15675`)
67+
- :class:`Easter` has gained a new constructor argument ``method`` which specifies the method used to calculate Easter — for example, Orthodox Easter (:issue:`61665`)
68+
- :class:`Holiday` has gained the constructor argument and field ``exclude_dates`` to exclude specific datetimes from a custom holiday calendar (:issue:`54382`)
6769
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
6870
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
6971
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
@@ -79,7 +81,7 @@ Other enhancements
7981
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
8082
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`).
8183
- Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`)
82-
- Added support to read from Apache Iceberg tables with the new :func:`read_iceberg` function (:issue:`61383`)
84+
- Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`)
8385
- Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`)
8486
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
8587
- Improved deprecation message for offset aliases (:issue:`60820`)
@@ -712,8 +714,10 @@ Timezones
712714
Numeric
713715
^^^^^^^
714716
- Bug in :meth:`DataFrame.corr` where numerical precision errors resulted in correlations above ``1.0`` (:issue:`61120`)
717+
- Bug in :meth:`DataFrame.cov` raises a ``TypeError`` instead of returning potentially incorrect results or other errors (:issue:`53115`)
715718
- Bug in :meth:`DataFrame.quantile` where the column type was not preserved when ``numeric_only=True`` with a list-like ``q`` produced an empty result (:issue:`59035`)
716719
- Bug in :meth:`Series.dot` returning ``object`` dtype for :class:`ArrowDtype` and nullable-dtype data (:issue:`61375`)
720+
- Bug in :meth:`Series.std` and :meth:`Series.var` when using complex-valued data (:issue:`61645`)
717721
- Bug in ``np.matmul`` with :class:`Index` inputs raising a ``TypeError`` (:issue:`57079`)
718722

719723
Conversion
@@ -884,6 +888,7 @@ Other
884888
- Bug in :func:`eval` with ``engine="numexpr"`` returning unexpected result for float division. (:issue:`59736`)
885889
- Bug in :func:`to_numeric` raising ``TypeError`` when ``arg`` is a :class:`Timedelta` or :class:`Timestamp` scalar. (:issue:`59944`)
886890
- Bug in :func:`unique` on :class:`Index` not always returning :class:`Index` (:issue:`57043`)
891+
- Bug in :meth:`DataFrame.apply` raising ``RecursionError`` when passing ``func=list[int]``. (:issue:`61565`)
887892
- Bug in :meth:`DataFrame.apply` where passing ``engine="numba"`` ignored ``args`` passed to the applied function (:issue:`58712`)
888893
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using NumPy attributes via ``@`` notation, e.g., ``df.eval("@np.floor(a)")``. (:issue:`58041`)
889894
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which did not allow to use ``tan`` function. (:issue:`55091`)

pandas/_libs/lib.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ from collections import abc
22
from decimal import Decimal
33
from enum import Enum
44
from sys import getsizeof
5+
from types import GenericAlias
56
from typing import (
67
Literal,
78
_GenericAlias,
@@ -1298,7 +1299,7 @@ cdef bint c_is_list_like(object obj, bint allow_sets) except -1:
12981299
getattr(obj, "__iter__", None) is not None and not isinstance(obj, type)
12991300
# we do not count strings/unicode/bytes as list-like
13001301
# exclude Generic types that have __iter__
1301-
and not isinstance(obj, (str, bytes, _GenericAlias))
1302+
and not isinstance(obj, (str, bytes, _GenericAlias, GenericAlias))
13021303
# exclude zero-dimensional duck-arrays, effectively scalars
13031304
and not (hasattr(obj, "ndim") and obj.ndim == 0)
13041305
# exclude sets if allow_sets is False

pandas/_libs/tslibs/offsets.pyi

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,13 @@ class FY5253Quarter(FY5253Mixin):
230230
variation: Literal["nearest", "last"] = ...,
231231
) -> None: ...
232232

233-
class Easter(SingleConstructorOffset): ...
233+
class Easter(SingleConstructorOffset):
234+
def __init__(
235+
self,
236+
n: int = ...,
237+
normalize: bool = ...,
238+
method: int = ...,
239+
) -> None: ...
234240

235241
class _CustomBusinessMonth(BusinessMixin):
236242
def __init__(

pandas/_libs/tslibs/offsets.pyx

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4520,6 +4520,12 @@ cdef class Easter(SingleConstructorOffset):
45204520
The number of years represented.
45214521
normalize : bool, default False
45224522
Normalize start/end dates to midnight before generating date range.
4523+
method : int, default 3
4524+
The method used to calculate the date of Easter. Valid options are:
4525+
- 1 (EASTER_JULIAN): Original calculation in Julian calendar
4526+
- 2 (EASTER_ORTHODOX): Original method, date converted to Gregorian calendar
4527+
- 3 (EASTER_WESTERN): Revised method, in Gregorian calendar
4528+
These constants are defined in the `dateutil.easter` module.
45234529
45244530
See Also
45254531
--------
@@ -4532,15 +4538,32 @@ cdef class Easter(SingleConstructorOffset):
45324538
Timestamp('2022-04-17 00:00:00')
45334539
"""
45344540

4541+
_attributes = tuple(["n", "normalize", "method"])
4542+
4543+
cdef readonly:
4544+
int method
4545+
4546+
from dateutil.easter import EASTER_WESTERN
4547+
4548+
def __init__(self, n=1, normalize=False, method=EASTER_WESTERN):
4549+
BaseOffset.__init__(self, n, normalize)
4550+
4551+
self.method = method
4552+
4553+
if method < 1 or method > 3:
4554+
raise ValueError(f"Method must be 1<=method<=3, got {method}")
4555+
45354556
cpdef __setstate__(self, state):
4557+
from dateutil.easter import EASTER_WESTERN
45364558
self.n = state.pop("n")
45374559
self.normalize = state.pop("normalize")
4560+
self.method = state.pop("method", EASTER_WESTERN)
45384561

45394562
@apply_wraps
45404563
def _apply(self, other: datetime) -> datetime:
45414564
from dateutil.easter import easter
45424565

4543-
current_easter = easter(other.year)
4566+
current_easter = easter(other.year, method=self.method)
45444567
current_easter = datetime(
45454568
current_easter.year, current_easter.month, current_easter.day
45464569
)
@@ -4555,7 +4578,7 @@ cdef class Easter(SingleConstructorOffset):
45554578

45564579
# NOTE: easter returns a datetime.date so we have to convert to type of
45574580
# other
4558-
new = easter(other.year + n)
4581+
new = easter(other.year + n, method=self.method)
45594582
new = datetime(
45604583
new.year,
45614584
new.month,
@@ -4573,7 +4596,7 @@ cdef class Easter(SingleConstructorOffset):
45734596

45744597
from dateutil.easter import easter
45754598

4576-
return date(dt.year, dt.month, dt.day) == easter(dt.year)
4599+
return date(dt.year, dt.month, dt.day) == easter(dt.year, method=self.method)
45774600

45784601

45794602
# ----------------------------------------------------------------------

pandas/core/frame.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3547,6 +3547,62 @@ def to_xml(
35473547

35483548
return xml_formatter.write_output()
35493549

3550+
def to_iceberg(
3551+
self,
3552+
table_identifier: str,
3553+
catalog_name: str | None = None,
3554+
*,
3555+
catalog_properties: dict[str, Any] | None = None,
3556+
location: str | None = None,
3557+
append: bool = False,
3558+
snapshot_properties: dict[str, str] | None = None,
3559+
) -> None:
3560+
"""
3561+
Write a DataFrame to an Apache Iceberg table.
3562+
3563+
.. versionadded:: 3.0.0
3564+
3565+
.. warning::
3566+
3567+
to_iceberg is experimental and may change without warning.
3568+
3569+
Parameters
3570+
----------
3571+
table_identifier : str
3572+
Table identifier.
3573+
catalog_name : str, optional
3574+
The name of the catalog.
3575+
catalog_properties : dict of {str: str}, optional
3576+
The properties that are used next to the catalog configuration.
3577+
location : str, optional
3578+
Location for the table.
3579+
append : bool, default False
3580+
If ``True``, append data to the table, instead of replacing the content.
3581+
snapshot_properties : dict of {str: str}, optional
3582+
Custom properties to be added to the snapshot summary
3583+
3584+
See Also
3585+
--------
3586+
read_iceberg : Read an Apache Iceberg table.
3587+
DataFrame.to_parquet : Write a DataFrame in Parquet format.
3588+
3589+
Examples
3590+
--------
3591+
>>> df = pd.DataFrame(data={"col1": [1, 2], "col2": [4, 3]})
3592+
>>> df.to_iceberg("my_table", catalog_name="my_catalog") # doctest: +SKIP
3593+
"""
3594+
from pandas.io.iceberg import to_iceberg
3595+
3596+
to_iceberg(
3597+
self,
3598+
table_identifier,
3599+
catalog_name,
3600+
catalog_properties=catalog_properties,
3601+
location=location,
3602+
append=append,
3603+
snapshot_properties=snapshot_properties,
3604+
)
3605+
35503606
# ----------------------------------------------------------------------
35513607
@doc(INFO_DOCSTRING, **frame_sub_kwargs)
35523608
def info(

pandas/core/nanops.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1014,7 +1014,11 @@ def nanvar(
10141014
avg = _ensure_numeric(values.sum(axis=axis, dtype=np.float64)) / count
10151015
if axis is not None:
10161016
avg = np.expand_dims(avg, axis)
1017-
sqr = _ensure_numeric((avg - values) ** 2)
1017+
if values.dtype.kind == "c":
1018+
# Need to use absolute value for complex numbers.
1019+
sqr = _ensure_numeric(abs(avg - values) ** 2)
1020+
else:
1021+
sqr = _ensure_numeric((avg - values) ** 2)
10181022
if mask is not None:
10191023
np.putmask(sqr, mask, 0)
10201024
result = sqr.sum(axis=axis, dtype=np.float64) / d

0 commit comments

Comments
 (0)