What’s new in 1.4.0 (January 22, 2022)#
These are the changes in pandas 1.4.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Improved warning messages#
Previously, warning messages may have pointed to lines within the pandas
library. Running the script setting_with_copy_warning.py
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5
with pandas 1.3 resulted in:
.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
This made it difficult to determine where the warning was being generated from. Now pandas will inspect the call stack, reporting the first line outside of the pandas library that gave rise to the warning. The output of the above script is now:
setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Index can hold arbitrary ExtensionArrays#
Until now, passing a custom ExtensionArray
to pd.Index
would cast
the array to object
dtype. Now Index
can directly hold arbitrary
ExtensionArrays (GH43930).
Previous behavior:
In [1]: arr = pd.array([1, 2, pd.NA])
In [2]: idx = pd.Index(arr)
In the old behavior, idx
would be object-dtype:
Previous behavior:
In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')
With the new behavior, we keep the original dtype:
New behavior:
In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')
One exception to this is SparseArray
, which will continue to cast to numpy
dtype until pandas 2.0. At that point it will retain its dtype like other
ExtensionArrays.
Styler#
Styler
has been further developed in 1.4.0. The following general enhancements have been made:
Styling and formatting of indexes has been added, with
Styler.apply_index()
,Styler.applymap_index()
andStyler.format_index()
. These mirror the signature of the methods already used to style and format data values, and work with both HTML, LaTeX and Excel format (GH41893, GH43101, GH41993, GH41995)The new method
Styler.hide()
deprecatesStyler.hide_index()
andStyler.hide_columns()
(GH43758)The keyword arguments
level
andnames
have been added toStyler.hide()
(and implicitly to the deprecated methodsStyler.hide_index()
andStyler.hide_columns()
) for additional control of visibility of MultiIndexes and of Index names (GH25475, GH43404, GH43346)The
Styler.export()
andStyler.use()
have been updated to address all of the added functionality from v1.2.0 and v1.3.0 (GH40675)Global options under the category
pd.options.styler
have been extended to configure defaultStyler
properties which address formatting, encoding, and HTML and LaTeX rendering. Note that formerlyStyler
relied ondisplay.html.use_mathjax
, which has now been replaced bystyler.html.mathjax
(GH41395)Validation of certain keyword arguments, e.g.
caption
(GH43368)Various bug fixes as recorded below
Additionally there are specific enhancements to the HTML specific rendering:
Styler.bar()
introduces additional arguments to control alignment and display (GH26070, GH36419), and it also validates the input argumentswidth
andheight
(GH42511)
Styler.to_html()
introduces keyword argumentssparse_index
,sparse_columns
,bold_headers
,caption
,max_rows
andmax_columns
(GH41946, GH43149, GH42972)
Styler.to_html()
omits CSSStyle rules for hidden table elements as a performance enhancement (GH43619)Custom CSS classes can now be directly specified without string replacement (GH43686)
Ability to render hyperlinks automatically via a new
hyperlinks
formatting keyword argument (GH45058)
There are also some LaTeX specific enhancements:
Styler.to_latex()
introduces keyword argumentenvironment
, which also allows a specific “longtable” entry through a separate jinja2 template (GH41866)Naive sparsification is now possible for LaTeX without the necessity of including the multirow package (GH43369)
cline support has been added for
MultiIndex
row sparsification through a keyword argument (GH45138)
Multi-threaded CSV reading with a new CSV Engine based on pyarrow#
pandas.read_csv()
now accepts engine="pyarrow"
(requires at least
pyarrow
1.0.1) as an argument, allowing for faster csv parsing on multicore
machines with pyarrow installed. See the I/O docs for
more info. (GH23697, GH43706)
Rank function for rolling and expanding windows#
Added rank
function to Rolling
and Expanding
. The new
function supports the method
, ascending
, and pct
flags of
DataFrame.rank()
. The method
argument supports min
, max
, and
average
ranking methods.
Example:
In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])
In [5]: s.rolling(3).rank()
Out[5]:
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 1.5
dtype: float64
In [6]: s.rolling(3).rank(method="max")
Out[6]:
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 2.0
dtype: float64
Groupby positional indexing#
It is now possible to specify positional ranges relative to the ends of each group.
Negative arguments for GroupBy.head()
and GroupBy.tail()
now work
correctly and result in ranges relative to the end and start of each group,
respectively. Previously, negative arguments returned empty frames.
In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
...: ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
...:
In [8]: df.groupby("A").head(-1)
Out[8]:
A B
0 g g0
1 g g1
2 g g2
4 h h0
GroupBy.nth()
now accepts a slice or list of integers and slices.
In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]:
B
A
g g1
g g2
In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]:
B
A
g g0
g g3
h h0
h h1
GroupBy.nth()
now accepts index notation.
In [11]: df.groupby("A").nth[1, -1]
Out[11]:
B
A
g g1
g g3
h h1
In [12]: df.groupby("A").nth[1:-1]
Out[12]:
B
A
g g1
g g2
In [13]: df.groupby("A").nth[:1, -1:]
Out[13]:
B
A
g g0
g g3
h h0
h h1
DataFrame.from_dict and DataFrame.to_dict have new 'tight'
option#
A new 'tight'
dictionary format that preserves MultiIndex
entries
and names is now available with the DataFrame.from_dict()
and
DataFrame.to_dict()
methods and can be used with the standard json
library to produce a tight representation of DataFrame
objects
(GH4889).
In [14]: df = pd.DataFrame.from_records(
....: [[1, 3], [2, 4]],
....: index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
....: names=["n1", "n2"]),
....: columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
....: names=["z1", "z2"]),
....: )
....:
In [15]: df
Out[15]:
z1 x y
z2 1 2
n1 n2
a b 1 3
c 2 4
In [16]: df.to_dict(orient='tight')
Out[16]:
{'index': [('a', 'b'), ('a', 'c')],
'columns': [('x', 1), ('y', 2)],
'data': [[1, 3], [2, 4]],
'index_names': ['n1', 'n2'],
'column_names': ['z1', 'z2']}
Other enhancements#
concat()
will preserve theattrs
when it is the same for all objects and discard theattrs
when they are different (GH41828)DataFrameGroupBy
operations withas_index=False
now correctly retainExtensionDtype
dtypes for columns being grouped on (GH41373)Add support for assigning values to
by
argument inDataFrame.plot.hist()
andDataFrame.plot.box()
(GH15079)Series.sample()
,DataFrame.sample()
, andGroupBy.sample()
now accept anp.random.Generator
as input torandom_state
. A generator will be more performant, especially withreplace=False
(GH38100)Series.ewm()
andDataFrame.ewm()
now support amethod
argument with a'table'
option that performs the windowing operation over an entireDataFrame
. See Window Overview for performance and functional benefits (GH42273)GroupBy.cummin()
andGroupBy.cummax()
now support the argumentskipna
(GH34047)read_table()
now supports the argumentstorage_options
(GH39167)DataFrame.to_stata()
andStataWriter()
now accept the keyword only argumentvalue_labels
to save labels for non-categorical columns (GH38454)Methods that relied on hashmap based algos such as
DataFrameGroupBy.value_counts()
,DataFrameGroupBy.count()
andfactorize()
ignored imaginary component for complex numbers (GH17927)Add
Series.str.removeprefix()
andSeries.str.removesuffix()
introduced in Python 3.9 to remove pre-/suffixes from string-typeSeries
(GH36944)Attempting to write into a file in missing parent directory with
DataFrame.to_csv()
,DataFrame.to_html()
,DataFrame.to_excel()
,DataFrame.to_feather()
,DataFrame.to_parquet()
,DataFrame.to_stata()
,DataFrame.to_json()
,DataFrame.to_pickle()
, andDataFrame.to_xml()
now explicitly mentions missing parent directory, the same is true forSeries
counterparts (GH24306)Indexing with
.loc
and.iloc
now supportsEllipsis
(GH37750)IntegerArray.all()
,IntegerArray.any()
,FloatingArray.any()
, andFloatingArray.all()
use Kleene logic (GH41967)Added support for nullable boolean and integer types in
DataFrame.to_stata()
,StataWriter
,StataWriter117
, andStataWriterUTF8
(GH40855)DataFrame.__pos__()
andDataFrame.__neg__()
now retainExtensionDtype
dtypes (GH43883)The error raised when an optional dependency can’t be imported now includes the original exception, for easier investigation (GH43882)
Added
ExponentialMovingWindow.sum()
(GH13297)Series.str.split()
now supports aregex
argument that explicitly specifies whether the pattern is a regular expression. Default isNone
(GH43563, GH32835, GH25549)DataFrame.dropna()
now accepts a single label assubset
along with array-like (GH41021)Added
DataFrameGroupBy.value_counts()
(GH43564)read_csv()
now accepts acallable
function inon_bad_lines
whenengine="python"
for custom handling of bad lines (GH5686)ExcelWriter
argumentif_sheet_exists="overlay"
option added (GH40231)read_excel()
now accepts adecimal
argument that allow the user to specify the decimal point when parsing string columns to numeric (GH14403)GroupBy.mean()
,GroupBy.std()
,GroupBy.var()
, andGroupBy.sum()
now support Numba execution with theengine
keyword (GH43731, GH44862, GH44939)Timestamp.isoformat()
now handles thetimespec
argument from the basedatetime
class (GH26131)NaT.to_numpy()
dtype
argument is now respected, sonp.timedelta64
can be returned (GH44460)New option
display.max_dir_items
customizes the number of columns added toDataframe.__dir__()
and suggested for tab completion (GH37996)Added “Juneteenth National Independence Day” to
USFederalHolidayCalendar
(GH44574)Rolling.var()
,Expanding.var()
,Rolling.std()
, andExpanding.std()
now support Numba execution with theengine
keyword (GH44461)Series.info()
has been added, for compatibility withDataFrame.info()
(GH5167)Implemented
IntervalArray.min()
andIntervalArray.max()
, as a result of whichmin
andmax
now work forIntervalIndex
,Series
andDataFrame
withIntervalDtype
(GH44746)UInt64Index.map()
now retainsdtype
where possible (GH44609)read_json()
can now parse unsigned long long integers (GH26068)DataFrame.take()
now raises aTypeError
when passed a scalar for the indexer (GH42875)is_list_like()
now identifies duck-arrays as list-like unless.ndim == 0
(GH35131)ExtensionDtype
andExtensionArray
are now (de)serialized when exporting aDataFrame
withDataFrame.to_json()
usingorient='table'
(GH20612, GH44705)Add support for Zstandard compression to
DataFrame.to_pickle()
/read_pickle()
and friends (GH43925)DataFrame.to_sql()
now returns anint
of the number of written rows (GH23998)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Inconsistent date string parsing#
The dayfirst
option of to_datetime()
isn’t strict, and this can lead
to surprising behavior:
In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[ns]', freq=None)
Now, a warning will be raised if a date string cannot be parsed accordance to
the given dayfirst
value when the value is a delimited date string (e.g.
31-12-2012
).
Ignoring dtypes in concat with empty or all-NA columns#
When using concat()
to concatenate two or more DataFrame
objects,
if one of the DataFrames was empty or had all-NA values, its dtype was
sometimes ignored when finding the concatenated dtype. These are now
consistently not ignored (GH43507).
In [18]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))
In [19]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))
In [20]: res = pd.concat([df1, df2])
Previously, the float-dtype in df2
would be ignored so the result dtype
would be datetime64[ns]
. As a result, the np.nan
would be cast to
NaT
.
Previous behavior:
In [4]: res
Out[4]:
bar
0 2013-01-01
1 NaT
Now the float-dtype is respected. Since the common dtype for these DataFrames is
object, the np.nan
is retained.
New behavior:
In [21]: res
Out[21]:
bar
0 2013-01-01 00:00:00
1 NaN
Null-values are no longer coerced to NaN-value in value_counts and mode#
Series.value_counts()
and Series.mode()
no longer coerce None
,
NaT
and other null-values to a NaN-value for np.object
-dtype. This
behavior is now consistent with unique
, isin
and others
(GH42688).
In [22]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])
In [23]: res = s.value_counts(dropna=False)
Previously, all null-values were replaced by a NaN-value.
Previous behavior:
In [3]: res
Out[3]:
NaN 5
True 1
dtype: int64
Now null-values are no longer mangled.
New behavior:
In [24]: res
Out[24]:
None 3
NaT 2
True 1
dtype: int64
mangle_dupe_cols in read_csv no longer renames unique columns conflicting with target names#
read_csv()
no longer renames unique column labels which conflict with the target
names of duplicated columns. Already existing columns are skipped, i.e. the next
available index is used for the target column name (GH14704).
In [25]: import io
In [26]: data = "a,a,a.1\n1,2,3"
In [27]: res = pd.read_csv(io.StringIO(data))
Previously, the second column was called a.1
, while the third column was
also renamed to a.1.1
.
Previous behavior:
In [3]: res
Out[3]:
a a.1 a.1.1
0 1 2 3
Now the renaming checks if a.1
already exists when changing the name of the
second column and jumps this index. The second column is instead renamed to
a.2
.
New behavior:
In [28]: res
Out[28]:
a a.2 a.1
0 1 2 3
unstack and pivot_table no longer raises ValueError for result that would exceed int32 limit#
Previously DataFrame.pivot_table()
and DataFrame.unstack()
would
raise a ValueError
if the operation could produce a result with more than
2**31 - 1
elements. This operation now raises a
errors.PerformanceWarning
instead (GH26314).
Previous behavior:
In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow
New behavior:
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.
groupby.apply consistent transform detection#
GroupBy.apply()
is designed to be flexible, allowing users to perform
aggregations, transformations, filters, and use it with user-defined functions
that might not fall into any of these categories. As part of this, apply will
attempt to detect when an operation is a transform, and in such a case, the
result will have the same index as the input. In order to determine if the
operation is a transform, pandas compares the input’s index to the result’s and
determines if it has been mutated. Previously in pandas 1.3, different code
paths used different definitions of “mutated”: some would use Python’s is
whereas others would test only up to equality.
This inconsistency has been removed, pandas now tests up to equality.
In [29]: def func(x):
....: return x.copy()
....:
In [30]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
In [31]: df
Out[31]:
a b c
0 1 3 5
1 2 4 6
Previous behavior:
In [3]: df.groupby(['a']).apply(func)
Out[3]:
a b c
a
1 0 1 3 5
2 1 2 4 6
In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
c
a b
1 3 5
2 4 6
In the examples above, the first uses a code path where pandas uses is
and
determines that func
is not a transform whereas the second tests up to
equality and determines that func
is a transform. In the first case, the
result’s index is not the same as the input’s.
New behavior:
In [32]: df.groupby(['a']).apply(func)
Out[32]:
a b c
0 1 3 5
1 2 4 6
In [33]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[33]:
c
a b
1 3 5
2 4 6
Now in both cases it is determined that func
is a transform. In each case,
the result has the same index as the input.
Backwards incompatible API changes#
Increased minimum version for Python#
pandas 1.4.0 supports Python 3.8 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
---|---|---|---|
numpy |
1.18.5 |
X |
X |
pytz |
2020.1 |
X |
X |
python-dateutil |
2.8.1 |
X |
X |
bottleneck |
1.3.1 |
X |
|
numexpr |
2.7.1 |
X |
|
pytest (dev) |
6.0 |
||
mypy (dev) |
0.930 |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
---|---|---|
beautifulsoup4 |
4.8.2 |
X |
fastparquet |
0.4.0 |
|
fsspec |
0.7.4 |
|
gcsfs |
0.6.0 |
|
lxml |
4.5.0 |
X |
matplotlib |
3.3.2 |
X |
numba |
0.50.1 |
X |
openpyxl |
3.0.3 |
X |
pandas-gbq |
0.14.0 |
X |
pyarrow |
1.0.1 |
X |
pymysql |
0.10.1 |
X |
pytables |
3.6.1 |
X |
s3fs |
0.4.0 |
|
scipy |
1.4.1 |
X |
sqlalchemy |
1.4.0 |
X |
tabulate |
0.8.7 |
|
xarray |
0.15.1 |
X |
xlrd |
2.0.1 |
X |
xlsxwriter |
1.2.2 |
X |
xlwt |
1.3.0 |
See Dependencies and Optional dependencies for more.
Other API changes#
Index.get_indexer_for()
no longer accepts keyword arguments (other thantarget
); in the past these would be silently ignored if the index was not unique (GH42310)Change in the position of the
min_rows
argument inDataFrame.to_string()
due to change in the docstring (GH44304)Reduction operations for
DataFrame
orSeries
now raising aValueError
whenNone
is passed forskipna
(GH44178)read_csv()
andread_html()
no longer raising an error when one of the header rows consists only ofUnnamed:
columns (GH13054)Changed the
name
attribute of several holidays inUSFederalHolidayCalendar
to match official federal holiday names specifically:“New Year’s Day” gains the possessive apostrophe
“Presidents Day” becomes “Washington’s Birthday”
“Martin Luther King Jr. Day” is now “Birthday of Martin Luther King, Jr.”
“July 4th” is now “Independence Day”
“Thanksgiving” is now “Thanksgiving Day”
“Christmas” is now “Christmas Day”
Added “Juneteenth National Independence Day”
Deprecations#
Deprecated Int64Index, UInt64Index & Float64Index#
Int64Index
, UInt64Index
and Float64Index
have been
deprecated in favor of the base Index
class and will be removed in
Pandas 2.0 (GH43028).
For constructing a numeric index, you can use the base Index
class
instead specifying the data type (which will also work on older pandas
releases):
# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")
For checking the data type of an index object, you can replace isinstance
checks with checking the dtype
:
# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"
Currently, in order to maintain backward compatibility, calls to Index
will continue to return Int64Index
, UInt64Index
and
Float64Index
when given numeric data, but in the future, an
Index
will be returned.
Current behavior:
In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')
Future behavior:
In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')
Deprecated DataFrame.append and Series.append#
DataFrame.append()
and Series.append()
have been deprecated and will
be removed in a future version. Use pandas.concat()
instead (GH35407).
Deprecated syntax
In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0 1
1 2
0 3
1 4
dtype: int64
In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
A B
0 1 2
1 3 4
0 5 6
1 7 8
Recommended syntax
In [34]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[34]:
0 1
1 2
0 3
1 4
dtype: int64
In [35]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [36]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [37]: pd.concat([df1, df2])
Out[37]:
A B
0 1 2
1 3 4
0 5 6
1 7 8
Other Deprecations#
Deprecated
Index.is_type_compatible()
(GH42113)Deprecated
method
argument inIndex.get_loc()
, useindex.get_indexer([label], method=...)
instead (GH42269)Deprecated treating integer keys in
Series.__setitem__()
as positional when the index is aFloat64Index
not containing the key, aIntervalIndex
with no entries containing the key, or aMultiIndex
with leadingFloat64Index
level not containing the key (GH33469)Deprecated treating
numpy.datetime64
objects as UTC times when passed to theTimestamp
constructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, useTimestamp(dt64).tz_localize("UTC").tz_convert(tz)
(GH24559)Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a
MultiIndex
(GH42351)Creating an empty
Series
without adtype
will now raise a more visibleFutureWarning
instead of aDeprecationWarning
(GH30017)Deprecated the
kind
argument inIndex.get_slice_bound()
,Index.slice_indexer()
, andIndex.slice_locs()
; in a future version passingkind
will raise (GH42857)Deprecated dropping of nuisance columns in
Rolling
,Expanding
, andEWM
aggregations (GH42738)Deprecated
Index.reindex()
with a non-uniqueIndex
(GH42568)Deprecated
Styler.render()
in favor ofStyler.to_html()
(GH42140)Deprecated
Styler.hide_index()
andStyler.hide_columns()
in favor ofStyler.hide()
(GH43758)Deprecated passing in a string column label into
times
inDataFrame.ewm()
(GH43265)Deprecated the
include_start
andinclude_end
arguments inDataFrame.between_time()
; in a future version passinginclude_start
orinclude_end
will raise (GH40245)Deprecated the
squeeze
argument toread_csv()
,read_table()
, andread_excel()
. Users should squeeze theDataFrame
afterwards with.squeeze("columns")
instead (GH43242)Deprecated the
index
argument toSparseArray
construction (GH23089)Deprecated the
closed
argument indate_range()
andbdate_range()
in favor ofinclusive
argument; In a future version passingclosed
will raise (GH40245)Deprecated
Rolling.validate()
,Expanding.validate()
, andExponentialMovingWindow.validate()
(GH43665)Deprecated silent dropping of columns that raised a
TypeError
inSeries.transform
andDataFrame.transform
when used with a dictionary (GH43740)Deprecated silent dropping of columns that raised a
TypeError
,DataError
, and some cases ofValueError
inSeries.aggregate()
,DataFrame.aggregate()
,Series.groupby.aggregate()
, andDataFrame.groupby.aggregate()
when used with a list (GH43740)Deprecated casting behavior when setting timezone-aware value(s) into a timezone-aware
Series
orDataFrame
column when the timezones do not match. Previously this cast to object dtype. In a future version, the values being inserted will be converted to the series or column’s existing timezone (GH37605)Deprecated casting behavior when passing an item with mismatched-timezone to
DatetimeIndex.insert()
,DatetimeIndex.putmask()
,DatetimeIndex.where()
DatetimeIndex.fillna()
,Series.mask()
,Series.where()
,Series.fillna()
,Series.shift()
,Series.replace()
,Series.reindex()
(andDataFrame
column analogues). In the past this has cast to objectdtype
. In a future version, these will cast the passed item to the index or series’s timezone (GH37605, GH44940)Deprecated the
prefix
keyword argument inread_csv()
andread_table()
, in a future version the argument will be removed (GH43396)Deprecated passing non boolean argument to
sort
inconcat()
(GH41518)Deprecated passing arguments as positional for
read_fwf()
other thanfilepath_or_buffer
(GH41485)Deprecated passing arguments as positional for
read_xml()
other thanpath_or_buffer
(GH45133)Deprecated passing
skipna=None
forDataFrame.mad()
andSeries.mad()
, passskipna=True
instead (GH44580)Deprecated the behavior of
to_datetime()
with the string “now” withutc=False
; in a future version this will matchTimestamp("now")
, which in turn matchesTimestamp.now()
returning the local time (GH18705)Deprecated
DateOffset.apply()
, useoffset + other
instead (GH44522)Deprecated parameter
names
inIndex.copy()
(GH44916)A deprecation warning is now shown for
DataFrame.to_latex()
indicating the arguments signature may change and emulate more the arguments toStyler.to_latex()
in future versions (GH44411)Deprecated behavior of
concat()
between objects with bool-dtype and numeric-dtypes; in a future version these will cast to object dtype instead of coercing bools to numeric values (GH39817)Deprecated
Categorical.replace()
, useSeries.replace()
instead (GH44929)Deprecated passing
set
ordict
as indexer forDataFrame.loc.__setitem__()
,DataFrame.loc.__getitem__()
,Series.loc.__setitem__()
,Series.loc.__getitem__()
,DataFrame.__getitem__()
,Series.__getitem__()
andSeries.__setitem__()
(GH42825)Deprecated
Index.__getitem__()
with a bool key; useindex.values[key]
to get the old behavior (GH44051)Deprecated downcasting column-by-column in
DataFrame.where()
with integer-dtypes (GH44597)Deprecated
DatetimeIndex.union_many()
, useDatetimeIndex.union()
instead (GH44091)Deprecated
Groupby.pad()
in favor ofGroupby.ffill()
(GH33396)Deprecated
Groupby.backfill()
in favor ofGroupby.bfill()
(GH33396)Deprecated
Resample.pad()
in favor ofResample.ffill()
(GH33396)Deprecated
Resample.backfill()
in favor ofResample.bfill()
(GH33396)Deprecated
numeric_only=None
inDataFrame.rank()
; in a future versionnumeric_only
must be eitherTrue
orFalse
(the default) (GH45036)Deprecated the behavior of
Timestamp.utcfromtimestamp()
, in the future it will return a timezone-aware UTCTimestamp
(GH22451)Deprecated
NaT.freq()
(GH45071)Deprecated behavior of
Series
andDataFrame
construction when passed float-dtype data containingNaN
and an integer dtype ignoring the dtype argument; in a future version this will raise (GH40110)Deprecated the behaviour of
Series.to_frame()
andIndex.to_frame()
to ignore thename
argument whenname=None
. Currently, this means to preserve the existing name, but in the future explicitly passingname=None
will setNone
as the name of the column in the resulting DataFrame (GH44212)
Performance improvements#
Performance improvement in
GroupBy.sample()
, especially whenweights
argument provided (GH34483)Performance improvement when converting non-string arrays to string arrays (GH34483)
Performance improvement in
GroupBy.transform()
for user-defined functions (GH41598)Performance improvement in constructing
DataFrame
objects (GH42631, GH43142, GH43147, GH43307, GH43144, GH44826)Performance improvement in
GroupBy.shift()
whenfill_value
argument is provided (GH26615)Performance improvement in
DataFrame.corr()
formethod=pearson
on data without missing values (GH40956)Performance improvement in some
GroupBy.apply()
operations (GH42992, GH43578)Performance improvement in
read_stata()
(GH43059, GH43227)Performance improvement in
read_sas()
(GH43333)Performance improvement in
to_datetime()
withuint
dtypes (GH42606)Performance improvement in
to_datetime()
withinfer_datetime_format
set toTrue
(GH43901)Performance improvement in
Series.sparse.to_coo()
(GH42880)Performance improvement in indexing with a
UInt64Index
(GH43862)Performance improvement in indexing with a
Float64Index
(GH43705)Performance improvement in indexing with a non-unique
Index
(GH43792)Performance improvement in indexing with a listlike indexer on a
MultiIndex
(GH43370)Performance improvement in indexing with a
MultiIndex
indexer on anotherMultiIndex
(GH43370)Performance improvement in
GroupBy.quantile()
(GH43469, GH43725)Performance improvement in
GroupBy.count()
(GH43730, GH43694)Performance improvement in
GroupBy.any()
andGroupBy.all()
(GH43675, GH42841)Performance improvement in
GroupBy.cumsum()
(GH43309)SparseArray.min()
andSparseArray.max()
no longer require converting to a dense array (GH43526)Indexing into a
SparseArray
with aslice
withstep=1
no longer requires converting to a dense array (GH43777)Performance improvement in
SparseArray.take()
withallow_fill=False
(GH43654)Performance improvement in
Rolling.mean()
,Expanding.mean()
,Rolling.sum()
,Expanding.sum()
,Rolling.max()
,Expanding.max()
,Rolling.min()
andExpanding.min()
withengine="numba"
(GH43612, GH44176, GH45170)Improved performance of
pandas.read_csv()
withmemory_map=True
when file encoding is UTF-8 (GH43787)Performance improvement in
RangeIndex.sort_values()
overridingIndex.sort_values()
(GH43666)Performance improvement in
RangeIndex.insert()
(GH43988)Performance improvement in
Index.insert()
(GH43953)Performance improvement in
DatetimeIndex.tolist()
(GH43823)Performance improvement in
DatetimeIndex.union()
(GH42353)Performance improvement in
Series.nsmallest()
(GH43696)Performance improvement in
DataFrame.insert()
(GH42998)Performance improvement in
DataFrame.dropna()
(GH43683)Performance improvement in
DataFrame.fillna()
(GH43316)Performance improvement in
DataFrame.values()
(GH43160)Performance improvement in
DataFrame.select_dtypes()
(GH42611)Performance improvement in
DataFrame
reductions (GH43185, GH43243, GH43311, GH43609)Performance improvement in
Series.unstack()
andDataFrame.unstack()
(GH43335, GH43352, GH42704, GH43025)Performance improvement in
Series.to_frame()
(GH43558)Performance improvement in
Series.mad()
(GH43010)Performance improvement in
to_csv()
when index column is a datetime and is formatted (GH39413)Performance improvement in
to_csv()
whenMultiIndex
contains a lot of unused levels (GH37484)Performance improvement in
read_csv()
whenindex_col
was set with a numeric column (GH44158)Performance improvement in
SparseArray.__getitem__()
(GH23122)Performance improvement in constructing a
DataFrame
from array-like objects like aPytorch
tensor (GH44616)
Bug fixes#
Categorical#
Bug in setting dtype-incompatible values into a
Categorical
(orSeries
orDataFrame
backed byCategorical
) raisingValueError
instead ofTypeError
(GH41919)Bug in
Categorical.searchsorted()
when passing a dtype-incompatible value raisingKeyError
instead ofTypeError
(GH41919)Bug in
Categorical.astype()
casting datetimes andTimestamp
to int for dtypeobject
(GH44930)Bug in
Series.where()
withCategoricalDtype
when passing a dtype-incompatible value raisingValueError
instead ofTypeError
(GH41919)Bug in
Categorical.fillna()
when passing a dtype-incompatible value raisingValueError
instead ofTypeError
(GH41919)Bug in
Categorical.fillna()
with a tuple-like category raisingValueError
instead ofTypeError
when filling with a non-category tuple (GH41919)
Datetimelike#
Bug in
DataFrame
constructor unnecessarily copying non-datetimelike 2D object arrays (GH39272)Bug in
to_datetime()
withformat
andpandas.NA
was raisingValueError
(GH42957)to_datetime()
would silently swapMM/DD/YYYY
andDD/MM/YYYY
formats if the givendayfirst
option could not be respected - now, a warning is raised in the case of delimited date strings (e.g.31-12-2012
) (GH12585)Bug in
date_range()
andbdate_range()
do not return right bound whenstart
=end
and set is closed on one side (GH43394)Bug in inplace addition and subtraction of
DatetimeIndex
orTimedeltaIndex
withDatetimeArray
orTimedeltaArray
(GH43904)Bug in calling
np.isnan
,np.isfinite
, ornp.isinf
on a timezone-awareDatetimeIndex
incorrectly raisingTypeError
(GH43917)Bug in constructing a
Series
from datetime-like strings with mixed timezones incorrectly partially-inferring datetime values (GH40111)Bug in addition of a
Tick
object and anp.timedelta64
object incorrectly raising instead of returningTimedelta
(GH44474)np.maximum.reduce
andnp.minimum.reduce
now correctly returnTimestamp
andTimedelta
objects when operating onSeries
,DataFrame
, orIndex
withdatetime64[ns]
ortimedelta64[ns]
dtype (GH43923)Bug in adding a
np.timedelta64
object to aBusinessDay
orCustomBusinessDay
object incorrectly raising (GH44532)Bug in
Index.insert()
for insertingnp.datetime64
,np.timedelta64
ortuple
intoIndex
withdtype='object'
with negative loc addingNone
and replacing existing value (GH44509)Bug in
Timestamp.to_pydatetime()
failing to retain thefold
attribute (GH45087)Bug in
Series.mode()
withDatetimeTZDtype
incorrectly returning timezone-naive andPeriodDtype
incorrectly raising (GH41927)Fixed regression in
reindex()
raising an error when using an incompatible fill value with a datetime-like dtype (or not raising a deprecation warning for using adatetime.date
as fill value) (GH42921)Bug in
DateOffset`
addition withTimestamp
whereoffset.nanoseconds
would not be included in the result (GH43968, GH36589)Bug in
Timestamp.fromtimestamp()
not supporting thetz
argument (GH45083)Bug in
DataFrame
construction from dict ofSeries
with mismatched index dtypes sometimes raising depending on the ordering of the passed dict (GH44091)Bug in
Timestamp
hashing during some DST transitions caused a segmentation fault (GH33931 and GH40817)
Timedelta#
Bug in division of all-
NaT
TimeDeltaIndex
,Series
orDataFrame
column with object-dtype array like of numbers failing to infer the result as timedelta64-dtype (GH39750)Bug in floor division of
timedelta64[ns]
data with a scalar returning garbage values (GH44466)Bug in
Timedelta
now properly taking into account any nanoseconds contribution of any kwarg (GH43764, GH45227)
Time Zones#
Bug in
to_datetime()
withinfer_datetime_format=True
failing to parse zero UTC offset (Z
) correctly (GH41047)Bug in
Series.dt.tz_convert()
resetting index in aSeries
withCategoricalIndex
(GH43080)Bug in
Timestamp
andDatetimeIndex
incorrectly raising aTypeError
when subtracting two timezone-aware objects with mismatched timezones (GH31793)
Numeric#
Bug in floor-dividing a list or tuple of integers by a
Series
incorrectly raising (GH44674)Bug in
DataFrame.rank()
raisingValueError
withobject
columns andmethod="first"
(GH41931)Bug in
DataFrame.rank()
treating missing values and extreme values as equal (for examplenp.nan
andnp.inf
), causing incorrect results whenna_option="bottom"
orna_option="top
used (GH41931)Bug in
numexpr
engine still being used when the optioncompute.use_numexpr
is set toFalse
(GH32556)Bug in
DataFrame
arithmetic ops with a subclass whose_constructor()
attribute is a callable other than the subclass itself (GH43201)Bug in arithmetic operations involving
RangeIndex
where the result would have the incorrectname
(GH43962)Bug in arithmetic operations involving
Series
where the result could have the incorrectname
when the operands having matching NA or matching tuple names (GH44459)Bug in division with
IntegerDtype
orBooleanDtype
array and NA scalar incorrectly raising (GH44685)Bug in multiplying a
Series
withFloatingDtype
with a timedelta-like scalar incorrectly raising (GH44772)
Conversion#
Bug in
UInt64Index
constructor when passing a list containing both positive integers small enough to cast to int64 and integers too large to hold in int64 (GH42201)Bug in
Series
constructor returning 0 for missing values with dtypeint64
andFalse
for dtypebool
(GH43017, GH43018)Bug in constructing a
DataFrame
from aPandasArray
containingSeries
objects behaving differently than an equivalentnp.ndarray
(GH43986)Bug in
IntegerDtype
not allowing coercion from string dtype (GH25472)Bug in
to_datetime()
witharg:xr.DataArray
andunit="ns"
specified raisesTypeError
(GH44053)Bug in
DataFrame.convert_dtypes()
not returning the correct type when a subclass does not overload_constructor_sliced()
(GH43201)Bug in
DataFrame.astype()
not propagatingattrs
from the originalDataFrame
(GH44414)Bug in
DataFrame.convert_dtypes()
result losingcolumns.names
(GH41435)Bug in constructing a
IntegerArray
from pyarrow data failing to validate dtypes (GH44891)Bug in
Series.astype()
not allowing converting from aPeriodDtype
todatetime64
dtype, inconsistent with thePeriodIndex
behavior (GH45038)
Strings#
Bug in checking for
string[pyarrow]
dtype incorrectly raising anImportError
when pyarrow is not installed (GH44276)
Interval#
Bug in
Series.where()
withIntervalDtype
incorrectly raising when thewhere
call should not replace anything (GH44181)
Indexing#
Bug in
Series.rename()
withMultiIndex
andlevel
is provided (GH43659)Bug in
DataFrame.truncate()
andSeries.truncate()
when the object’sIndex
has a length greater than one but only one unique value (GH42365)Bug in
Series.loc()
andDataFrame.loc()
with aMultiIndex
when indexing with a tuple in which one of the levels is also a tuple (GH27591)Bug in
Series.loc()
with aMultiIndex
whose first level contains onlynp.nan
values (GH42055)Bug in indexing on a
Series
orDataFrame
with aDatetimeIndex
when passing a string, the return type depended on whether the index was monotonic (GH24892)Bug in indexing on a
MultiIndex
failing to drop scalar levels when the indexer is a tuple containing a datetime-like string (GH42476)Bug in
DataFrame.sort_values()
andSeries.sort_values()
when passing an ascending value, failed to raise or incorrectly raisingValueError
(GH41634)Bug in updating values of
pandas.Series
using boolean index, created by usingpandas.DataFrame.pop()
(GH42530)Bug in
Index.get_indexer_non_unique()
when index contains multiplenp.nan
(GH35392)Bug in
DataFrame.query()
did not handle the degree sign in a backticked column name, such as `Temp(°C)`, used in an expression to query aDataFrame
(GH42826)Bug in
DataFrame.drop()
where the error message did not show missing labels with commas when raisingKeyError
(GH42881)Bug in
DataFrame.query()
where method calls in query strings led to errors when thenumexpr
package was installed (GH22435)Bug in
DataFrame.nlargest()
andSeries.nlargest()
where sorted result did not count indexes containingnp.nan
(GH28984)Bug in indexing on a non-unique object-dtype
Index
with an NA scalar (e.g.np.nan
) (GH43711)Bug in
DataFrame.__setitem__()
incorrectly writing into an existing column’s array rather than setting a new array when the new dtype and the old dtype match (GH43406)Bug in setting floating-dtype values into a
Series
with integer dtype failing to set inplace when those values can be losslessly converted to integers (GH44316)Bug in
Series.__setitem__()
with object dtype when setting an array with matching size and dtype=’datetime64[ns]’ or dtype=’timedelta64[ns]’ incorrectly converting the datetime/timedeltas to integers (GH43868)Bug in
DataFrame.sort_index()
whereignore_index=True
was not being respected when the index was already sorted (GH43591)Bug in
Index.get_indexer_non_unique()
when index contains multiplenp.datetime64("NaT")
andnp.timedelta64("NaT")
(GH43869)Bug in setting a scalar
Interval
value into aSeries
withIntervalDtype
when the scalar’s sides are floats and the values’ sides are integers (GH44201)Bug when setting string-backed
Categorical
values that can be parsed to datetimes into aDatetimeArray
orSeries
orDataFrame
column backed byDatetimeArray
failing to parse these strings (GH44236)Bug in
Series.__setitem__()
with an integer dtype other thanint64
setting with arange
object unnecessarily upcasting toint64
(GH44261)Bug in
Series.__setitem__()
with a boolean mask indexer setting a listlike value of length 1 incorrectly broadcasting that value (GH44265)Bug in
Series.reset_index()
not ignoringname
argument whendrop
andinplace
are set toTrue
(GH44575)Bug in
DataFrame.loc.__setitem__()
andDataFrame.iloc.__setitem__()
with mixed dtypes sometimes failing to operate in-place (GH44345)Bug in
DataFrame.loc.__getitem__()
incorrectly raisingKeyError
when selecting a single column with a boolean key (GH44322).Bug in setting
DataFrame.iloc()
with a singleExtensionDtype
column and setting 2D values e.g.df.iloc[:] = df.values
incorrectly raising (GH44514)Bug in setting values with
DataFrame.iloc()
with a singleExtensionDtype
column and a tuple of arrays as the indexer (GH44703)Bug in indexing on columns with
loc
oriloc
using a slice with a negative step withExtensionDtype
columns incorrectly raising (GH44551)Bug in
DataFrame.loc.__setitem__()
changing dtype when indexer was completelyFalse
(GH37550)Bug in
IntervalIndex.get_indexer_non_unique()
returning boolean mask instead of array of integers for a non unique and non monotonic index (GH44084)Bug in
IntervalIndex.get_indexer_non_unique()
not handling targets ofdtype
‘object’ with NaNs correctly (GH44482)Fixed regression where a single column
np.matrix
was no longer coerced to a 1dnp.ndarray
when added to aDataFrame
(GH42376)Bug in
Series.__getitem__()
with aCategoricalIndex
of integers treating lists of integers as positional indexers, inconsistent with the behavior with a single scalar integer (GH15470, GH14865)Bug in
Series.__setitem__()
when setting floats or integers into integer-dtypeSeries
failing to upcast when necessary to retain precision (GH45121)Bug in
DataFrame.iloc.__setitem__()
ignores axis argument (GH45032)
Missing#
Bug in
DataFrame.fillna()
withlimit
and nomethod
ignoresaxis='columns'
oraxis = 1
(GH40989, GH17399)Bug in
DataFrame.fillna()
not replacing missing values when using a dict-likevalue
and duplicate column names (GH43476)Bug in constructing a
DataFrame
with a dictionarynp.datetime64
as a value anddtype='timedelta64[ns]'
, or vice-versa, incorrectly casting instead of raising (GH44428)Bug in
Series.interpolate()
andDataFrame.interpolate()
withinplace=True
not writing to the underlying array(s) in-place (GH44749)Bug in
Index.fillna()
incorrectly returning an unfilledIndex
when NA values are present anddowncast
argument is specified. This now raisesNotImplementedError
instead; do not passdowncast
argument (GH44873)Bug in
DataFrame.dropna()
changingIndex
even if no entries were dropped (GH41965)Bug in
Series.fillna()
with an object-dtype incorrectly ignoringdowncast="infer"
(GH44241)
MultiIndex#
Bug in
MultiIndex.get_loc()
where the first level is aDatetimeIndex
and a string key is passed (GH42465)Bug in
MultiIndex.reindex()
when passing alevel
that corresponds to anExtensionDtype
level (GH42043)Bug in
MultiIndex.get_loc()
raisingTypeError
instead ofKeyError
on nested tuple (GH42440)Bug in
MultiIndex.union()
setting wrongsortorder
causing errors in subsequent indexing operations with slices (GH44752)Bug in
MultiIndex.putmask()
where the other value was also aMultiIndex
(GH43212)Bug in
MultiIndex.dtypes()
duplicate level names returned only one dtype per name (GH45174)
I/O#
Bug in
read_excel()
attempting to read chart sheets from .xlsx files (GH41448)Bug in
json_normalize()
whereerrors=ignore
could fail to ignore missing values ofmeta
whenrecord_path
has a length greater than one (GH41876)Bug in
read_csv()
with multi-header input and arguments referencing column names as tuples (GH42446)Bug in
read_fwf()
, where difference in lengths ofcolspecs
andnames
was not raisingValueError
(GH40830)Bug in
Series.to_json()
andDataFrame.to_json()
where some attributes were skipped when serializing plain Python objects to JSON (GH42768, GH33043)Column headers are dropped when constructing a
DataFrame
from a sqlalchemy’sRow
object (GH40682)Bug in unpickling an
Index
with object dtype incorrectly inferring numeric dtypes (GH43188)Bug in
read_csv()
where reading multi-header input with unequal lengths incorrectly raisedIndexError
(GH43102)Bug in
read_csv()
raisingParserError
when reading file in chunks and some chunk blocks have fewer columns than header forengine="c"
(GH21211)Bug in
read_csv()
, changed exception class when expecting a file path name or file-like object fromOSError
toTypeError
(GH43366)Bug in
read_csv()
andread_fwf()
ignoring allskiprows
except first whennrows
is specified forengine='python'
(GH44021, GH10261)Bug in
read_csv()
keeping the original column in object format whenkeep_date_col=True
is set (GH13378)Bug in
read_json()
not handling non-numpy dtypes correctly (especiallycategory
) (GH21892, GH33205)Bug in
json_normalize()
where multi-charactersep
parameter is incorrectly prefixed to every key (GH43831)Bug in
json_normalize()
where reading data with missing multi-level metadata would not respecterrors="ignore"
(GH44312)Bug in
read_csv()
used second row to guess implicit index ifheader
was set toNone
forengine="python"
(GH22144)Bug in
read_csv()
not recognizing bad lines whennames
were given forengine="c"
(GH22144)Bug in
read_csv()
withfloat_precision="round_trip"
which did not skip initial/trailing whitespace (GH43713)Bug when Python is built without the lzma module: a warning was raised at the pandas import time, even if the lzma capability isn’t used (GH43495)
Bug in
read_csv()
not applying dtype forindex_col
(GH9435)Bug in dumping/loading a
DataFrame
withyaml.dump(frame)
(GH42748)Bug in
read_csv()
raisingValueError
whennames
was longer thanheader
but equal to data rows forengine="python"
(GH38453)Bug in
ExcelWriter
, whereengine_kwargs
were not passed through to all engines (GH43442)Bug in
read_csv()
raisingValueError
whenparse_dates
was used withMultiIndex
columns (GH8991)Bug in
read_csv()
not raising anValueError
when\n
was specified asdelimiter
orsep
which conflicts withlineterminator
(GH43528)Bug in
to_csv()
converting datetimes in categoricalSeries
to integers (GH40754)Bug in
read_csv()
converting columns to numeric after date parsing failed (GH11019)Bug in
read_csv()
not replacingNaN
values withnp.nan
before attempting date conversion (GH26203)Bug in
read_csv()
raisingAttributeError
when attempting to read a .csv file and infer index column dtype from an nullable integer type (GH44079)Bug in
to_csv()
always coercing datetime columns with different formats to the same format (GH21734)DataFrame.to_csv()
andSeries.to_csv()
withcompression
set to'zip'
no longer create a zip file containing a file ending with “.zip”. Instead, they try to infer the inner file name more smartly (GH39465)Bug in
read_csv()
where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (GH42808, GH34120)Bug in
to_xml()
raising error forpd.NA
with extension array dtype (GH43903)Bug in
read_csv()
when passing simultaneously a parser indate_parser
andparse_dates=False
, the parsing was still called (GH44366)Bug in
read_csv()
not setting name ofMultiIndex
columns correctly whenindex_col
is not the first column (GH38549)Bug in
read_csv()
silently ignoring errors when failing to create a memory-mapped file (GH44766)Bug in
read_csv()
when passing atempfile.SpooledTemporaryFile
opened in binary mode (GH44748)Bug in
read_json()
raisingValueError
when attempting to parse json strings containing “://” (GH36271)Bug in
read_csv()
whenengine="c"
andencoding_errors=None
which caused a segfault (GH45180)Bug in
read_csv()
an invalid value ofusecols
leading to an unclosed file handle (GH45384)Bug in
DataFrame.to_json()
fix memory leak (GH43877)
Period#
Bug in adding a
Period
object to anp.timedelta64
object incorrectly raisingTypeError
(GH44182)Bug in
PeriodIndex.to_timestamp()
when the index hasfreq="B"
inferringfreq="D"
for its result instead offreq="B"
(GH44105)Bug in
Period
constructor incorrectly allowingnp.timedelta64("NaT")
(GH44507)Bug in
PeriodIndex.to_timestamp()
giving incorrect values for indexes with non-contiguous data (GH44100)Bug in
Series.where()
withPeriodDtype
incorrectly raising when thewhere
call should not replace anything (GH45135)
Plotting#
When given non-numeric data,
DataFrame.boxplot()
now raises aValueError
rather than a crypticKeyError
orZeroDivisionError
, in line with other plotting functions likeDataFrame.hist()
(GH43480)
Groupby/resample/rolling#
Bug in
SeriesGroupBy.apply()
where passing an unrecognized string argument failed to raiseTypeError
when the underlyingSeries
is empty (GH42021)Bug in
Series.rolling.apply()
,DataFrame.rolling.apply()
,Series.expanding.apply()
andDataFrame.expanding.apply()
withengine="numba"
where*args
were being cached with the user passed function (GH42287)Bug in
GroupBy.max()
andGroupBy.min()
with nullable integer dtypes losing precision (GH41743)Bug in
DataFrame.groupby.rolling.var()
would calculate the rolling variance only on the first group (GH42442)Bug in
GroupBy.shift()
that would return the grouping columns iffill_value
was notNone
(GH41556)Bug in
SeriesGroupBy.nlargest()
andSeriesGroupBy.nsmallest()
would have an inconsistent index when the inputSeries
was sorted andn
was greater than or equal to all group sizes (GH15272, GH16345, GH29129)Bug in
pandas.DataFrame.ewm()
, where non-float64 dtypes were silently failing (GH42452)Bug in
pandas.DataFrame.rolling()
operation along rows (axis=1
) incorrectly omits columns containingfloat16
andfloat32
(GH41779)Bug in
Resampler.aggregate()
did not allow the use of Named Aggregation (GH32803)Bug in
Series.rolling()
when theSeries
dtype
wasInt64
(GH43016)Bug in
DataFrame.rolling.corr()
when theDataFrame
columns was aMultiIndex
(GH21157)Bug in
DataFrame.groupby.rolling()
when specifyingon
and calling__getitem__
would subsequently return incorrect results (GH43355)Bug in
GroupBy.apply()
with time-basedGrouper
objects incorrectly raisingValueError
in corner cases where the grouping vector contains aNaT
(GH43500, GH43515)Bug in
GroupBy.mean()
failing withcomplex
dtype (GH43701)Bug in
Series.rolling()
andDataFrame.rolling()
not calculating window bounds correctly for the first row whencenter=True
and index is decreasing (GH43927)Bug in
Series.rolling()
andDataFrame.rolling()
for centered datetimelike windows with uneven nanosecond (GH43997)Bug in
GroupBy.mean()
raisingKeyError
when column was selected at least twice (GH44924)Bug in
GroupBy.nth()
failing onaxis=1
(GH43926)Bug in
Series.rolling()
andDataFrame.rolling()
not respecting right bound on centered datetime-like windows, if the index contain duplicates (GH3944)Bug in
Series.rolling()
andDataFrame.rolling()
when using apandas.api.indexers.BaseIndexer
subclass that returned unequal start and end arrays would segfault instead of raising aValueError
(GH44470)Bug in
Groupby.nunique()
not respectingobserved=True
forcategorical
grouping columns (GH45128)Bug in
GroupBy.head()
andGroupBy.tail()
not dropping groups withNaN
whendropna=True
(GH45089)Bug in
GroupBy.__iter__()
after selecting a subset of columns in aGroupBy
object, which returned all columns instead of the chosen subset (GH44821)Bug in
Groupby.rolling()
when non-monotonic data passed, fails to correctly raiseValueError
(GH43909)Bug where grouping by a
Series
that has acategorical
data type and length unequal to the axis of grouping raisedValueError
(GH44179)
Reshaping#
Improved error message when creating a
DataFrame
column from a multi-dimensionalnumpy.ndarray
(GH42463)Bug in
concat()
creatingMultiIndex
with duplicate level entries when concatenating aDataFrame
with duplicates inIndex
and multiple keys (GH42651)Bug in
pandas.cut()
onSeries
with duplicate indices and non-exactpandas.CategoricalIndex()
(GH42185, GH42425)Bug in
DataFrame.append()
failing to retain dtypes when appended columns do not match (GH43392)Bug in
concat()
ofbool
andboolean
dtypes resulting inobject
dtype instead ofboolean
dtype (GH42800)Bug in
crosstab()
when inputs are categoricalSeries
, there are categories that are not present in one or both of theSeries
, andmargins=True
. Previously the margin value for missing categories wasNaN
. It is now correctly reported as 0 (GH43505)Bug in
concat()
would fail when theobjs
argument all had the same index and thekeys
argument contained duplicates (GH43595)Bug in
merge()
withMultiIndex
as column index for theon
argument returning an error when assigning a column internally (GH43734)Bug in
crosstab()
would fail when inputs are lists or tuples (GH44076)Bug in
DataFrame.append()
failing to retainindex.name
when appending a list ofSeries
objects (GH44109)Fixed metadata propagation in
Dataframe.apply()
method, consequently fixing the same issue forDataframe.transform()
,Dataframe.nunique()
andDataframe.mode()
(GH28283)Bug in
concat()
casting levels ofMultiIndex
to float if all levels only consist of missing values (GH44900)Bug in
DataFrame.stack()
withExtensionDtype
columns incorrectly raising (GH43561)Bug in
merge()
raisingKeyError
when joining over differently named indexes with on keywords (GH45094)Bug in
Series.unstack()
with object doing unwanted type inference on resulting columns (GH44595)Bug in
MultiIndex.join()
with overlappingIntervalIndex
levels (GH44096)Bug in
DataFrame.replace()
andSeries.replace()
results is differentdtype
based onregex
parameter (GH44864)Bug in
DataFrame.pivot()
withindex=None
when theDataFrame
index was aMultiIndex
(GH23955)
Sparse#
Bug in
DataFrame.sparse.to_coo()
raisingAttributeError
when column names are not unique (GH29564)Bug in
SparseArray.max()
andSparseArray.min()
raisingValueError
for arrays with 0 non-null elements (GH43527)Bug in
DataFrame.sparse.to_coo()
silently converting non-zero fill values to zero (GH24817)Bug in
SparseArray
comparison methods with an array-like operand of mismatched length raisingAssertionError
or unclearValueError
depending on the input (GH43863)Bug in
SparseArray
arithmetic methodsfloordiv
andmod
behaviors when dividing by zero not matching the non-sparseSeries
behavior (GH38172)Bug in
SparseArray
unary methods as well asSparseArray.isna()
doesn’t recalculate indexes (GH44955)
ExtensionArray#
NumPy ufuncs
np.abs
,np.positive
,np.negative
now correctly preserve dtype when called on ExtensionArrays that implement__abs__, __pos__, __neg__
, respectively. In particular this is fixed forTimedeltaArray
(GH43899, GH23316)NumPy ufuncs
np.minimum.reduce
np.maximum.reduce
,np.add.reduce
, andnp.prod.reduce
now work correctly instead of raisingNotImplementedError
onSeries
withIntegerDtype
orFloatDtype
(GH43923, GH44793)NumPy ufuncs with
out
keyword are now supported by arrays withIntegerDtype
andFloatingDtype
(GH45122)Avoid raising
PerformanceWarning
about fragmentedDataFrame
when using many columns with an extension dtype (GH44098)Bug in
IntegerArray
andFloatingArray
construction incorrectly coercing mismatched NA values (e.g.np.timedelta64("NaT")
) to numeric NA (GH44514)Bug in
BooleanArray.__eq__()
andBooleanArray.__ne__()
raisingTypeError
on comparison with an incompatible type (like a string). This causedDataFrame.replace()
to sometimes raise aTypeError
if a nullable boolean column was included (GH44499)Bug in
array()
incorrectly raising when passed andarray
withfloat16
dtype (GH44715)Bug in calling
np.sqrt
onBooleanArray
returning a malformedFloatingArray
(GH44715)Bug in
Series.where()
withExtensionDtype
whenother
is a NA scalar incompatible with theSeries
dtype (e.g.NaT
with a numeric dtype) incorrectly casting to a compatible NA value (GH44697)Bug in
Series.replace()
where explicitly passingvalue=None
is treated as if novalue
was passed, andNone
not being in the result (GH36984, GH19998)Bug in
Series.replace()
with unwanted downcasting being done in no-op replacements (GH44498)Bug in
Series.replace()
withFloatDtype
,string[python]
, orstring[pyarrow]
dtype not being preserved when possible (GH33484, GH40732, GH31644, GH41215, GH25438)
Styler#
Bug in
Styler
where theuuid
at initialization maintained a floating underscore (GH43037)Bug in
Styler.to_html()
where theStyler
object was updated if theto_html
method was called with some args (GH43034)Bug in
Styler.copy()
whereuuid
was not previously copied (GH40675)Bug in
Styler.apply()
where functions which returnedSeries
objects were not correctly handled in terms of aligning their index labels (GH13657, GH42014)Bug when rendering an empty
DataFrame
with a namedIndex
(GH43305)Bug when rendering a single level
MultiIndex
(GH43383)Bug when combining non-sparse rendering and
Styler.hide_columns()
orStyler.hide_index()
(GH43464)Bug setting a table style when using multiple selectors in
Styler
(GH44011)Bugs where row trimming and column trimming failed to reflect hidden rows (GH43703, GH44247)
Other#
Bug in
DataFrame.astype()
with non-unique columns and aSeries
dtype
argument (GH44417)Bug in
CustomBusinessMonthBegin.__add__()
(CustomBusinessMonthEnd.__add__()
) not applying the extraoffset
parameter when beginning (end) of the target month is already a business day (GH41356)Bug in
RangeIndex.union()
with anotherRangeIndex
with matching (even)step
and starts differing by strictly less thanstep / 2
(GH44019)Bug in
RangeIndex.difference()
withsort=None
andstep<0
failing to sort (GH44085)Bug in
Series.replace()
andDataFrame.replace()
withvalue=None
and ExtensionDtypes (GH44270, GH37899)Bug in
FloatingArray.equals()
failing to consider two arrays equal if they containnp.nan
values (GH44382)Bug in
DataFrame.shift()
withaxis=1
andExtensionDtype
columns incorrectly raising when an incompatiblefill_value
is passed (GH44564)Bug in
DataFrame.shift()
withaxis=1
andperiods
larger thanlen(frame.columns)
producing an invalidDataFrame
(GH44978)Bug in
DataFrame.diff()
when passing a NumPy integer object instead of anint
object (GH44572)Bug in
Series.replace()
raisingValueError
when usingregex=True
with aSeries
containingnp.nan
values (GH43344)Bug in
DataFrame.to_records()
where an incorrectn
was used when missing names were replaced bylevel_n
(GH44818)Bug in
DataFrame.eval()
whereresolvers
argument was overriding the default resolvers (GH34966)Series.__repr__()
andDataFrame.__repr__()
no longer replace all null-values in indexes with “NaN” but use their real string-representations. “NaN” is used only forfloat("nan")
(GH45263)
Contributors#
A total of 275 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Abhishek R
Albert Villanova del Moral
Alessandro Bisiani +
Alex Lim
Alex-Gregory-1 +
Alexander Gorodetsky
Alexander Regueiro +
Alexey Györi
Alexis Mignon
Aleš Erjavec
Ali McMaster
Alibi +
Andrei Batomunkuev +
Andrew Eckart +
Andrew Hawyrluk
Andrew Wood
Anton Lodder +
Armin Berres +
Arushi Sharma +
Benedikt Heidrich +
Beni Bienz +
Benoît Vinot
Bert Palm +
Boris Rumyantsev +
Brian Hulette
Brock
Bruno Costa +
Bryan Racic +
Caleb Epstein
Calvin Ho
ChristofKaufmann +
Christopher Yeh +
Chuliang Xiao +
ClaudiaSilver +
DSM
Daniel Coll +
Daniel Schmidt +
Dare Adewumi
David +
David Sanders +
David Wales +
Derzan Chiang +
DeviousLab +
Dhruv B Shetty +
Digres45 +
Dominik Kutra +
Drew Levitt +
DriesS
EdAbati
Elle
Elliot Rampono
Endre Mark Borza
Erfan Nariman
Evgeny Naumov +
Ewout ter Hoeven +
Fangchen Li
Felix Divo
Felix Dulys +
Francesco Andreuzzi +
Francois Dion +
Frans Larsson +
Fred Reiss
GYvan
Gabriel Di Pardi Arruda +
Gesa Stupperich
Giacomo Caria +
Greg Siano +
Griffin Ansel
Hiroaki Ogasawara +
Horace +
Horace Lai +
Irv Lustig
Isaac Virshup
JHM Darbyshire (MBP)
JHM Darbyshire (iMac)
JHM Darbyshire +
Jack Liu
Jacob Skwirsk +
Jaime Di Cristina +
James Holcombe +
Janosh Riebesell +
Jarrod Millman
Jason Bian +
Jeff Reback
Jernej Makovsek +
Jim Bradley +
Joel Gibson +
Joeperdefloep +
Johannes Mueller +
John S Bogaardt +
John Zangwill +
Jon Haitz Legarreta Gorroño +
Jon Wiggins +
Jonas Haag +
Joris Van den Bossche
Josh Friedlander
José Duarte +
Julian Fleischer +
Julien de la Bruère-T
Justin McOmie
Kadatatlu Kishore +
Kaiqi Dong
Kashif Khan +
Kavya9986 +
Kendall +
Kevin Sheppard
Kiley Hewitt
Koen Roelofs +
Krishna Chivukula
KrishnaSai2020
Leonardo Freua +
Leonardus Chen
Liang-Chi Hsieh +
Loic Diridollou +
Lorenzo Maffioli +
Luke Manley +
LunarLanding +
Marc Garcia
Marcel Bittar +
Marcel Gerber +
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Marvin +
Mateusz Piotrowski +
Mathias Hauser +
Matt Richards +
Matthew Davis +
Matthew Roeschke
Matthew Zeitlin
Matthias Bussonnier
Matti Picus
Mauro Silberberg +
Maxim Ivanov
Maximilian Carr +
MeeseeksMachine
Michael Sarrazin +
Michael Wang +
Michał Górny +
Mike Phung +
Mike Taves +
Mohamad Hussein Rkein +
NJOKU OKECHUKWU VALENTINE +
Neal McBurnett +
Nick Anderson +
Nikita Sobolev +
Olivier Cavadenti +
PApostol +
Pandas Development Team
Patrick Hoefler
Peter
Peter Tillmann +
Prabha Arivalagan +
Pradyumna Rahul
Prerana Chakraborty
Prithvijit +
Rahul Gaikwad +
Ray Bell
Ricardo Martins +
Richard Shadrach
Robbert-jan ‘t Hoen +
Robert Voyer +
Robin Raymond +
Rohan Sharma +
Rohan Sirohia +
Roman Yurchak
Ruan Pretorius +
Sam James +
Scott Talbert
Shashwat Sharma +
Sheogorath27 +
Shiv Gupta
Shoham Debnath
Simon Hawkins
Soumya +
Stan West +
Stefanie Molin +
Stefano Alberto Russo +
Stephan Heßelmann
Stephen
Suyash Gupta +
Sven
Swanand01 +
Sylvain Marié +
TLouf
Tania Allard +
Terji Petersen
TheDerivator +
Thomas Dickson
Thomas Kastl +
Thomas Kluyver
Thomas Li
Thomas Smith
Tim Swast
Tim Tran +
Tobias McNulty +
Tobias Pitters
Tomoki Nakagawa +
Tony Hirst +
Torsten Wörtwein
V.I. Wood +
Vaibhav K +
Valentin Oliver Loftsson +
Varun Shrivastava +
Vivek Thazhathattil +
Vyom Pathak
Wenjun Si
William Andrea +
William Bradley +
Wojciech Sadowski +
Yao-Ching Huang +
Yash Gupta +
Yiannis Hadjicharalambous +
Yoshiki Vázquez Baeza
Yuanhao Geng
Yury Mikhaylov
Yvan Gatete +
Yves Delley +
Zach Rait
Zbyszek Królikowski +
Zero +
Zheyuan
Zhiyi Wu +
aiudirog
ali sayyah +
aneesh98 +
aptalca
arw2019 +
attack68
brendandrury +
bubblingoak +
calvinsomething +
claws +
deponovo +
dicristina
el-g-1 +
evensure +
fotino21 +
fshi01 +
gfkang +
github-actions[bot]
i-aki-y
jbrockmendel
jreback
juliandwain +
jxb4892 +
kendall smith +
lmcindewar +
lrepiton
maximilianaccardo +
michal-gh
neelmraman
partev
phofl +
pratyushsharan +
quantumalaviya +
rafael +
realead
rocabrera +
rosagold
saehuihwang +
salomondush +
shubham11941140 +
srinivasan +
stphnlyd
suoniq
trevorkask +
tushushu
tyuyoshi +
usersblock +
vernetya +
vrserpa +
willie3838 +
zeitlinv +
zhangxiaoxing +