What’s new in 1.4.0 (January 22, 2022)#
These are the changes in pandas 1.4.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Improved warning messages#
Previously, warning messages may have pointed to lines within the pandas
library. Running the script setting_with_copy_warning.py
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5
with pandas 1.3 resulted in:
.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
This made it difficult to determine where the warning was being generated from. Now pandas will inspect the call stack, reporting the first line outside of the pandas library that gave rise to the warning. The output of the above script is now:
setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Index can hold arbitrary ExtensionArrays#
Until now, passing a custom ExtensionArray to pd.Index would cast
the array to object dtype. Now Index can directly hold arbitrary
ExtensionArrays (GH43930).
Previous behavior:
In [1]: arr = pd.array([1, 2, pd.NA])
In [2]: idx = pd.Index(arr)
In the old behavior, idx would be object-dtype:
Previous behavior:
In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')
With the new behavior, we keep the original dtype:
New behavior:
In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')
One exception to this is SparseArray, which will continue to cast to numpy
dtype until pandas 2.0. At that point it will retain its dtype like other
ExtensionArrays.
Styler#
Styler has been further developed in 1.4.0. The following general enhancements have been made:
Styling and formatting of indexes has been added, with
Styler.apply_index(),Styler.applymap_index()andStyler.format_index(). These mirror the signature of the methods already used to style and format data values, and work with both HTML, LaTeX and Excel format (GH41893, GH43101, GH41993, GH41995)
The new method
Styler.hide()deprecatesStyler.hide_index()andStyler.hide_columns()(GH43758)
The keyword arguments
levelandnameshave been added toStyler.hide()(and implicitly to the deprecated methodsStyler.hide_index()andStyler.hide_columns()) for additional control of visibility of MultiIndexes and of Index names (GH25475, GH43404, GH43346)
The
Styler.export()andStyler.use()have been updated to address all of the added functionality from v1.2.0 and v1.3.0 (GH40675)
Global options under the category
pd.options.stylerhave been extended to configure defaultStylerproperties which address formatting, encoding, and HTML and LaTeX rendering. Note that formerlyStylerrelied ondisplay.html.use_mathjax, which has now been replaced bystyler.html.mathjax(GH41395)
Validation of certain keyword arguments, e.g.
caption(GH43368)
Various bug fixes as recorded below
Additionally there are specific enhancements to the HTML specific rendering:
Styler.bar()introduces additional arguments to control alignment and display (GH26070, GH36419), and it also validates the input argumentswidthandheight(GH42511)
Styler.to_html()introduces keyword argumentssparse_index,sparse_columns,bold_headers,caption,max_rowsandmax_columns(GH41946, GH43149, GH42972)
Styler.to_html()omits CSSStyle rules for hidden table elements as a performance enhancement (GH43619)
Custom CSS classes can now be directly specified without string replacement (GH43686)
Ability to render hyperlinks automatically via a new
hyperlinksformatting keyword argument (GH45058)
There are also some LaTeX specific enhancements:
Styler.to_latex()introduces keyword argumentenvironment, which also allows a specific “longtable” entry through a separate jinja2 template (GH41866)
Naive sparsification is now possible for LaTeX without the necessity of including the multirow package (GH43369)
cline support has been added for
MultiIndexrow sparsification through a keyword argument (GH45138)
Multi-threaded CSV reading with a new CSV Engine based on pyarrow#
pandas.read_csv() now accepts engine="pyarrow" (requires at least
pyarrow 1.0.1) as an argument, allowing for faster csv parsing on multicore
machines with pyarrow installed. See the I/O docs for
more info. (GH23697, GH43706)
Rank function for rolling and expanding windows#
Added rank function to Rolling and Expanding. The new
function supports the method, ascending, and pct flags of
DataFrame.rank(). The method argument supports min, max, and
average ranking methods.
Example:
In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])
In [5]: s.rolling(3).rank()
Out[5]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
dtype: float64
In [6]: s.rolling(3).rank(method="max")
Out[6]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
dtype: float64
Groupby positional indexing#
It is now possible to specify positional ranges relative to the ends of each group.
Negative arguments for GroupBy.head() and GroupBy.tail() now work
correctly and result in ranges relative to the end and start of each group,
respectively. Previously, negative arguments returned empty frames.
In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
   ...:                    ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
   ...: 
In [8]: df.groupby("A").head(-1)
Out[8]: 
   A   B
0  g  g0
1  g  g1
2  g  g2
4  h  h0
GroupBy.nth() now accepts a slice or list of integers and slices.
In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]: 
    B
A    
g  g1
g  g2
In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]: 
    B
A    
g  g0
g  g3
h  h0
h  h1
GroupBy.nth() now accepts index notation.
In [11]: df.groupby("A").nth[1, -1]
Out[11]: 
    B
A    
g  g1
g  g3
h  h1
In [12]: df.groupby("A").nth[1:-1]
Out[12]: 
    B
A    
g  g1
g  g2
In [13]: df.groupby("A").nth[:1, -1:]
Out[13]: 
    B
A    
g  g0
g  g3
h  h0
h  h1
DataFrame.from_dict and DataFrame.to_dict have new 'tight' option#
A new 'tight' dictionary format that preserves MultiIndex entries
and names is now available with the DataFrame.from_dict() and
DataFrame.to_dict() methods and can be used with the standard json
library to produce a tight representation of DataFrame objects
(GH4889).
In [14]: df = pd.DataFrame.from_records(
   ....:     [[1, 3], [2, 4]],
   ....:     index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
   ....:                                     names=["n1", "n2"]),
   ....:     columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
   ....:                                       names=["z1", "z2"]),
   ....: )
   ....: 
In [15]: df
Out[15]: 
z1     x  y
z2     1  2
n1 n2      
a  b   1  3
   c   2  4
In [16]: df.to_dict(orient='tight')
Out[16]: 
{'index': [('a', 'b'), ('a', 'c')],
 'columns': [('x', 1), ('y', 2)],
 'data': [[1, 3], [2, 4]],
 'index_names': ['n1', 'n2'],
 'column_names': ['z1', 'z2']}
Other enhancements#
- concat()will preserve the- attrswhen it is the same for all objects and discard the- attrswhen they are different (GH41828)
- DataFrameGroupByoperations with- as_index=Falsenow correctly retain- ExtensionDtypedtypes for columns being grouped on (GH41373)
- Add support for assigning values to - byargument in- DataFrame.plot.hist()and- DataFrame.plot.box()(GH15079)
- Series.sample(),- DataFrame.sample(), and- GroupBy.sample()now accept a- np.random.Generatoras input to- random_state. A generator will be more performant, especially with- replace=False(GH38100)
- Series.ewm()and- DataFrame.ewm()now support a- methodargument with a- 'table'option that performs the windowing operation over an entire- DataFrame. See Window Overview for performance and functional benefits (GH42273)
- GroupBy.cummin()and- GroupBy.cummax()now support the argument- skipna(GH34047)
- read_table()now supports the argument- storage_options(GH39167)
- DataFrame.to_stata()and- StataWriter()now accept the keyword only argument- value_labelsto save labels for non-categorical columns (GH38454)
- Methods that relied on hashmap based algos such as - DataFrameGroupBy.value_counts(),- DataFrameGroupBy.count()and- factorize()ignored imaginary component for complex numbers (GH17927)
- Add - Series.str.removeprefix()and- Series.str.removesuffix()introduced in Python 3.9 to remove pre-/suffixes from string-type- Series(GH36944)
- Attempting to write into a file in missing parent directory with - DataFrame.to_csv(),- DataFrame.to_html(),- DataFrame.to_excel(),- DataFrame.to_feather(),- DataFrame.to_parquet(),- DataFrame.to_stata(),- DataFrame.to_json(),- DataFrame.to_pickle(), and- DataFrame.to_xml()now explicitly mentions missing parent directory, the same is true for- Seriescounterparts (GH24306)
- Indexing with - .locand- .ilocnow supports- Ellipsis(GH37750)
- IntegerArray.all(),- IntegerArray.any(),- FloatingArray.any(), and- FloatingArray.all()use Kleene logic (GH41967)
- Added support for nullable boolean and integer types in - DataFrame.to_stata(),- StataWriter,- StataWriter117, and- StataWriterUTF8(GH40855)
- DataFrame.__pos__()and- DataFrame.__neg__()now retain- ExtensionDtypedtypes (GH43883)
- The error raised when an optional dependency can’t be imported now includes the original exception, for easier investigation (GH43882) 
- Added - ExponentialMovingWindow.sum()(GH13297)
- Series.str.split()now supports a- regexargument that explicitly specifies whether the pattern is a regular expression. Default is- None(GH43563, GH32835, GH25549)
- DataFrame.dropna()now accepts a single label as- subsetalong with array-like (GH41021)
- Added - DataFrameGroupBy.value_counts()(GH43564)
- read_csv()now accepts a- callablefunction in- on_bad_lineswhen- engine="python"for custom handling of bad lines (GH5686)
- ExcelWriterargument- if_sheet_exists="overlay"option added (GH40231)
- read_excel()now accepts a- decimalargument that allow the user to specify the decimal point when parsing string columns to numeric (GH14403)
- GroupBy.mean(),- GroupBy.std(),- GroupBy.var(), and- GroupBy.sum()now support Numba execution with the- enginekeyword (GH43731, GH44862, GH44939)
- Timestamp.isoformat()now handles the- timespecargument from the base- datetimeclass (GH26131)
- NaT.to_numpy()- dtypeargument is now respected, so- np.timedelta64can be returned (GH44460)
- New option - display.max_dir_itemscustomizes the number of columns added to- Dataframe.__dir__()and suggested for tab completion (GH37996)
- Added “Juneteenth National Independence Day” to - USFederalHolidayCalendar(GH44574)
- Rolling.var(),- Expanding.var(),- Rolling.std(), and- Expanding.std()now support Numba execution with the- enginekeyword (GH44461)
- Series.info()has been added, for compatibility with- DataFrame.info()(GH5167)
- Implemented - IntervalArray.min()and- IntervalArray.max(), as a result of which- minand- maxnow work for- IntervalIndex,- Seriesand- DataFramewith- IntervalDtype(GH44746)
- UInt64Index.map()now retains- dtypewhere possible (GH44609)
- read_json()can now parse unsigned long long integers (GH26068)
- DataFrame.take()now raises a- TypeErrorwhen passed a scalar for the indexer (GH42875)
- is_list_like()now identifies duck-arrays as list-like unless- .ndim == 0(GH35131)
- ExtensionDtypeand- ExtensionArrayare now (de)serialized when exporting a- DataFramewith- DataFrame.to_json()using- orient='table'(GH20612, GH44705)
- Add support for Zstandard compression to - DataFrame.to_pickle()/- read_pickle()and friends (GH43925)
- DataFrame.to_sql()now returns an- intof the number of written rows (GH23998)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Inconsistent date string parsing#
The dayfirst option of to_datetime() isn’t strict, and this can lead
to surprising behavior:
In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[ns]', freq=None)
Now, a warning will be raised if a date string cannot be parsed accordance to
the given dayfirst value when the value is a delimited date string (e.g.
31-12-2012).
Ignoring dtypes in concat with empty or all-NA columns#
When using concat() to concatenate two or more DataFrame objects,
if one of the DataFrames was empty or had all-NA values, its dtype was
sometimes ignored when finding the concatenated dtype.  These are now
consistently not ignored (GH43507).
In [18]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))
In [19]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))
In [20]: res = pd.concat([df1, df2])
Previously, the float-dtype in df2 would be ignored so the result dtype
would be datetime64[ns]. As a result, the np.nan would be cast to
NaT.
Previous behavior:
In [4]: res
Out[4]:
         bar
0 2013-01-01
1        NaT
Now the float-dtype is respected. Since the common dtype for these DataFrames is
object, the np.nan is retained.
New behavior:
In [21]: res
Out[21]: 
                   bar
0  2013-01-01 00:00:00
1                  NaN
Null-values are no longer coerced to NaN-value in value_counts and mode#
Series.value_counts() and Series.mode() no longer coerce None,
NaT and other null-values to a NaN-value for np.object-dtype. This
behavior is now consistent with unique, isin and others
(GH42688).
In [22]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])
In [23]: res = s.value_counts(dropna=False)
Previously, all null-values were replaced by a NaN-value.
Previous behavior:
In [3]: res
Out[3]:
NaN     5
True    1
dtype: int64
Now null-values are no longer mangled.
New behavior:
In [24]: res
Out[24]: 
None    3
NaT     2
True    1
dtype: int64
mangle_dupe_cols in read_csv no longer renames unique columns conflicting with target names#
read_csv() no longer renames unique column labels which conflict with the target
names of duplicated columns. Already existing columns are skipped, i.e. the next
available index is used for the target column name (GH14704).
In [25]: import io
In [26]: data = "a,a,a.1\n1,2,3"
In [27]: res = pd.read_csv(io.StringIO(data))
Previously, the second column was called a.1, while the third column was
also renamed to a.1.1.
Previous behavior:
In [3]: res
Out[3]:
    a  a.1  a.1.1
0   1    2      3
Now the renaming checks if a.1 already exists when changing the name of the
second column and jumps this index. The second column is instead renamed to
a.2.
New behavior:
In [28]: res
Out[28]: 
   a  a.2  a.1
0  1    2    3
unstack and pivot_table no longer raises ValueError for result that would exceed int32 limit#
Previously DataFrame.pivot_table() and DataFrame.unstack() would
raise a ValueError if the operation could produce a result with more than
2**31 - 1 elements. This operation now raises a
errors.PerformanceWarning instead (GH26314).
Previous behavior:
In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow
New behavior:
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.
groupby.apply consistent transform detection#
GroupBy.apply() is designed to be flexible, allowing users to perform
aggregations, transformations, filters, and use it with user-defined functions
that might not fall into any of these categories. As part of this, apply will
attempt to detect when an operation is a transform, and in such a case, the
result will have the same index as the input. In order to determine if the
operation is a transform, pandas compares the input’s index to the result’s and
determines if it has been mutated. Previously in pandas 1.3, different code
paths used different definitions of “mutated”: some would use Python’s is
whereas others would test only up to equality.
This inconsistency has been removed, pandas now tests up to equality.
In [29]: def func(x):
   ....:     return x.copy()
   ....: 
In [30]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
In [31]: df
Out[31]: 
   a  b  c
0  1  3  5
1  2  4  6
Previous behavior:
In [3]: df.groupby(['a']).apply(func)
Out[3]:
     a  b  c
a
1 0  1  3  5
2 1  2  4  6
In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
     c
a b
1 3  5
2 4  6
In the examples above, the first uses a code path where pandas uses is and
determines that func is not a transform whereas the second tests up to
equality and determines that func is a transform. In the first case, the
result’s index is not the same as the input’s.
New behavior:
In [32]: df.groupby(['a']).apply(func)
Out[32]: 
   a  b  c
0  1  3  5
1  2  4  6
In [33]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[33]: 
     c
a b   
1 3  5
2 4  6
Now in both cases it is determined that func is a transform. In each case,
the result has the same index as the input.
Backwards incompatible API changes#
Increased minimum version for Python#
pandas 1.4.0 supports Python 3.8 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
| Package | Minimum Version | Required | Changed | 
|---|---|---|---|
| numpy | 1.18.5 | X | X | 
| pytz | 2020.1 | X | X | 
| python-dateutil | 2.8.1 | X | X | 
| bottleneck | 1.3.1 | X | |
| numexpr | 2.7.1 | X | |
| pytest (dev) | 6.0 | ||
| mypy (dev) | 0.930 | X | 
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
| Package | Minimum Version | Changed | 
|---|---|---|
| beautifulsoup4 | 4.8.2 | X | 
| fastparquet | 0.4.0 | |
| fsspec | 0.7.4 | |
| gcsfs | 0.6.0 | |
| lxml | 4.5.0 | X | 
| matplotlib | 3.3.2 | X | 
| numba | 0.50.1 | X | 
| openpyxl | 3.0.3 | X | 
| pandas-gbq | 0.14.0 | X | 
| pyarrow | 1.0.1 | X | 
| pymysql | 0.10.1 | X | 
| pytables | 3.6.1 | X | 
| s3fs | 0.4.0 | |
| scipy | 1.4.1 | X | 
| sqlalchemy | 1.4.0 | X | 
| tabulate | 0.8.7 | |
| xarray | 0.15.1 | X | 
| xlrd | 2.0.1 | X | 
| xlsxwriter | 1.2.2 | X | 
| xlwt | 1.3.0 | 
See Dependencies and Optional dependencies for more.
Other API changes#
- Index.get_indexer_for()no longer accepts keyword arguments (other than- target); in the past these would be silently ignored if the index was not unique (GH42310)
- Change in the position of the - min_rowsargument in- DataFrame.to_string()due to change in the docstring (GH44304)
- Reduction operations for - DataFrameor- Seriesnow raising a- ValueErrorwhen- Noneis passed for- skipna(GH44178)
- read_csv()and- read_html()no longer raising an error when one of the header rows consists only of- Unnamed:columns (GH13054)
- Changed the - nameattribute of several holidays in- USFederalHolidayCalendarto match official federal holiday names specifically:- “New Year’s Day” gains the possessive apostrophe 
- “Presidents Day” becomes “Washington’s Birthday” 
- “Martin Luther King Jr. Day” is now “Birthday of Martin Luther King, Jr.” 
- “July 4th” is now “Independence Day” 
- “Thanksgiving” is now “Thanksgiving Day” 
- “Christmas” is now “Christmas Day” 
- Added “Juneteenth National Independence Day” 
 
Deprecations#
Deprecated Int64Index, UInt64Index & Float64Index#
Int64Index, UInt64Index and Float64Index have been
deprecated in favor of the base Index class and will be removed in
Pandas 2.0 (GH43028).
For constructing a numeric index, you can use the base Index class
instead specifying the data type (which will also work on older pandas
releases):
# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")
For checking the data type of an index object, you can replace isinstance
checks with checking the dtype:
# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"
Currently, in order to maintain backward compatibility, calls to Index
will continue to return Int64Index, UInt64Index and
Float64Index when given numeric data, but in the future, an
Index will be returned.
Current behavior:
In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')
Future behavior:
In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')
Deprecated DataFrame.append and Series.append#
DataFrame.append() and Series.append() have been deprecated and will
be removed in a future version. Use pandas.concat() instead (GH35407).
Deprecated syntax
In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0    1
1    2
0    3
1    4
dtype: int64
In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
   A  B
0  1  2
1  3  4
0  5  6
1  7  8
Recommended syntax
In [34]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[34]: 
0    1
1    2
0    3
1    4
dtype: int64
In [35]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [36]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [37]: pd.concat([df1, df2])
Out[37]: 
   A  B
0  1  2
1  3  4
0  5  6
1  7  8
Other Deprecations#
- Deprecated - Index.is_type_compatible()(GH42113)
- Deprecated - methodargument in- Index.get_loc(), use- index.get_indexer([label], method=...)instead (GH42269)
- Deprecated treating integer keys in - Series.__setitem__()as positional when the index is a- Float64Indexnot containing the key, a- IntervalIndexwith no entries containing the key, or a- MultiIndexwith leading- Float64Indexlevel not containing the key (GH33469)
- Deprecated treating - numpy.datetime64objects as UTC times when passed to the- Timestampconstructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, use- Timestamp(dt64).tz_localize("UTC").tz_convert(tz)(GH24559)
- Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a - MultiIndex(GH42351)
- Creating an empty - Serieswithout a- dtypewill now raise a more visible- FutureWarninginstead of a- DeprecationWarning(GH30017)
- Deprecated the - kindargument in- Index.get_slice_bound(),- Index.slice_indexer(), and- Index.slice_locs(); in a future version passing- kindwill raise (GH42857)
- Deprecated dropping of nuisance columns in - Rolling,- Expanding, and- EWMaggregations (GH42738)
- Deprecated - Index.reindex()with a non-unique- Index(GH42568)
- Deprecated - Styler.render()in favor of- Styler.to_html()(GH42140)
- Deprecated - Styler.hide_index()and- Styler.hide_columns()in favor of- Styler.hide()(GH43758)
- Deprecated passing in a string column label into - timesin- DataFrame.ewm()(GH43265)
- Deprecated the - include_startand- include_endarguments in- DataFrame.between_time(); in a future version passing- include_startor- include_endwill raise (GH40245)
- Deprecated the - squeezeargument to- read_csv(),- read_table(), and- read_excel(). Users should squeeze the- DataFrameafterwards with- .squeeze("columns")instead (GH43242)
- Deprecated the - indexargument to- SparseArrayconstruction (GH23089)
- Deprecated the - closedargument in- date_range()and- bdate_range()in favor of- inclusiveargument; In a future version passing- closedwill raise (GH40245)
- Deprecated - Rolling.validate(),- Expanding.validate(), and- ExponentialMovingWindow.validate()(GH43665)
- Deprecated silent dropping of columns that raised a - TypeErrorin- Series.transformand- DataFrame.transformwhen used with a dictionary (GH43740)
- Deprecated silent dropping of columns that raised a - TypeError,- DataError, and some cases of- ValueErrorin- Series.aggregate(),- DataFrame.aggregate(),- Series.groupby.aggregate(), and- DataFrame.groupby.aggregate()when used with a list (GH43740)
- Deprecated casting behavior when setting timezone-aware value(s) into a timezone-aware - Seriesor- DataFramecolumn when the timezones do not match. Previously this cast to object dtype. In a future version, the values being inserted will be converted to the series or column’s existing timezone (GH37605)
- Deprecated casting behavior when passing an item with mismatched-timezone to - DatetimeIndex.insert(),- DatetimeIndex.putmask(),- DatetimeIndex.where()- DatetimeIndex.fillna(),- Series.mask(),- Series.where(),- Series.fillna(),- Series.shift(),- Series.replace(),- Series.reindex()(and- DataFramecolumn analogues). In the past this has cast to object- dtype. In a future version, these will cast the passed item to the index or series’s timezone (GH37605, GH44940)
- Deprecated the - prefixkeyword argument in- read_csv()and- read_table(), in a future version the argument will be removed (GH43396)
- Deprecated passing non boolean argument to - sortin- concat()(GH41518)
- Deprecated passing arguments as positional for - read_fwf()other than- filepath_or_buffer(GH41485)
- Deprecated passing arguments as positional for - read_xml()other than- path_or_buffer(GH45133)
- Deprecated passing - skipna=Nonefor- DataFrame.mad()and- Series.mad(), pass- skipna=Trueinstead (GH44580)
- Deprecated the behavior of - to_datetime()with the string “now” with- utc=False; in a future version this will match- Timestamp("now"), which in turn matches- Timestamp.now()returning the local time (GH18705)
- Deprecated - DateOffset.apply(), use- offset + otherinstead (GH44522)
- Deprecated parameter - namesin- Index.copy()(GH44916)
- A deprecation warning is now shown for - DataFrame.to_latex()indicating the arguments signature may change and emulate more the arguments to- Styler.to_latex()in future versions (GH44411)
- Deprecated behavior of - concat()between objects with bool-dtype and numeric-dtypes; in a future version these will cast to object dtype instead of coercing bools to numeric values (GH39817)
- Deprecated - Categorical.replace(), use- Series.replace()instead (GH44929)
- Deprecated passing - setor- dictas indexer for- DataFrame.loc.__setitem__(),- DataFrame.loc.__getitem__(),- Series.loc.__setitem__(),- Series.loc.__getitem__(),- DataFrame.__getitem__(),- Series.__getitem__()and- Series.__setitem__()(GH42825)
- Deprecated - Index.__getitem__()with a bool key; use- index.values[key]to get the old behavior (GH44051)
- Deprecated downcasting column-by-column in - DataFrame.where()with integer-dtypes (GH44597)
- Deprecated - DatetimeIndex.union_many(), use- DatetimeIndex.union()instead (GH44091)
- Deprecated - Groupby.pad()in favor of- Groupby.ffill()(GH33396)
- Deprecated - Groupby.backfill()in favor of- Groupby.bfill()(GH33396)
- Deprecated - Resample.pad()in favor of- Resample.ffill()(GH33396)
- Deprecated - Resample.backfill()in favor of- Resample.bfill()(GH33396)
- Deprecated - numeric_only=Nonein- DataFrame.rank(); in a future version- numeric_onlymust be either- Trueor- False(the default) (GH45036)
- Deprecated the behavior of - Timestamp.utcfromtimestamp(), in the future it will return a timezone-aware UTC- Timestamp(GH22451)
- Deprecated - NaT.freq()(GH45071)
- Deprecated behavior of - Seriesand- DataFrameconstruction when passed float-dtype data containing- NaNand an integer dtype ignoring the dtype argument; in a future version this will raise (GH40110)
- Deprecated the behaviour of - Series.to_frame()and- Index.to_frame()to ignore the- nameargument when- name=None. Currently, this means to preserve the existing name, but in the future explicitly passing- name=Nonewill set- Noneas the name of the column in the resulting DataFrame (GH44212)
Performance improvements#
- Performance improvement in - GroupBy.sample(), especially when- weightsargument provided (GH34483)
- Performance improvement when converting non-string arrays to string arrays (GH34483) 
- Performance improvement in - GroupBy.transform()for user-defined functions (GH41598)
- Performance improvement in constructing - DataFrameobjects (GH42631, GH43142, GH43147, GH43307, GH43144, GH44826)
- Performance improvement in - GroupBy.shift()when- fill_valueargument is provided (GH26615)
- Performance improvement in - DataFrame.corr()for- method=pearsonon data without missing values (GH40956)
- Performance improvement in some - GroupBy.apply()operations (GH42992, GH43578)
- Performance improvement in - read_stata()(GH43059, GH43227)
- Performance improvement in - read_sas()(GH43333)
- Performance improvement in - to_datetime()with- uintdtypes (GH42606)
- Performance improvement in - to_datetime()with- infer_datetime_formatset to- True(GH43901)
- Performance improvement in - Series.sparse.to_coo()(GH42880)
- Performance improvement in indexing with a - UInt64Index(GH43862)
- Performance improvement in indexing with a - Float64Index(GH43705)
- Performance improvement in indexing with a non-unique - Index(GH43792)
- Performance improvement in indexing with a listlike indexer on a - MultiIndex(GH43370)
- Performance improvement in indexing with a - MultiIndexindexer on another- MultiIndex(GH43370)
- Performance improvement in - GroupBy.quantile()(GH43469, GH43725)
- Performance improvement in - GroupBy.count()(GH43730, GH43694)
- Performance improvement in - GroupBy.any()and- GroupBy.all()(GH43675, GH42841)
- Performance improvement in - GroupBy.cumsum()(GH43309)
- SparseArray.min()and- SparseArray.max()no longer require converting to a dense array (GH43526)
- Indexing into a - SparseArraywith a- slicewith- step=1no longer requires converting to a dense array (GH43777)
- Performance improvement in - SparseArray.take()with- allow_fill=False(GH43654)
- Performance improvement in - Rolling.mean(),- Expanding.mean(),- Rolling.sum(),- Expanding.sum(),- Rolling.max(),- Expanding.max(),- Rolling.min()and- Expanding.min()with- engine="numba"(GH43612, GH44176, GH45170)
- Improved performance of - pandas.read_csv()with- memory_map=Truewhen file encoding is UTF-8 (GH43787)
- Performance improvement in - RangeIndex.sort_values()overriding- Index.sort_values()(GH43666)
- Performance improvement in - RangeIndex.insert()(GH43988)
- Performance improvement in - Index.insert()(GH43953)
- Performance improvement in - DatetimeIndex.tolist()(GH43823)
- Performance improvement in - DatetimeIndex.union()(GH42353)
- Performance improvement in - Series.nsmallest()(GH43696)
- Performance improvement in - DataFrame.insert()(GH42998)
- Performance improvement in - DataFrame.dropna()(GH43683)
- Performance improvement in - DataFrame.fillna()(GH43316)
- Performance improvement in - DataFrame.values()(GH43160)
- Performance improvement in - DataFrame.select_dtypes()(GH42611)
- Performance improvement in - DataFramereductions (GH43185, GH43243, GH43311, GH43609)
- Performance improvement in - Series.unstack()and- DataFrame.unstack()(GH43335, GH43352, GH42704, GH43025)
- Performance improvement in - Series.to_frame()(GH43558)
- Performance improvement in - Series.mad()(GH43010)
- Performance improvement in - to_csv()when index column is a datetime and is formatted (GH39413)
- Performance improvement in - to_csv()when- MultiIndexcontains a lot of unused levels (GH37484)
- Performance improvement in - read_csv()when- index_colwas set with a numeric column (GH44158)
- Performance improvement in - SparseArray.__getitem__()(GH23122)
- Performance improvement in constructing a - DataFramefrom array-like objects like a- Pytorchtensor (GH44616)
Bug fixes#
Categorical#
- Bug in setting dtype-incompatible values into a - Categorical(or- Seriesor- DataFramebacked by- Categorical) raising- ValueErrorinstead of- TypeError(GH41919)
- Bug in - Categorical.searchsorted()when passing a dtype-incompatible value raising- KeyErrorinstead of- TypeError(GH41919)
- Bug in - Categorical.astype()casting datetimes and- Timestampto int for dtype- object(GH44930)
- Bug in - Series.where()with- CategoricalDtypewhen passing a dtype-incompatible value raising- ValueErrorinstead of- TypeError(GH41919)
- Bug in - Categorical.fillna()when passing a dtype-incompatible value raising- ValueErrorinstead of- TypeError(GH41919)
- Bug in - Categorical.fillna()with a tuple-like category raising- ValueErrorinstead of- TypeErrorwhen filling with a non-category tuple (GH41919)
Datetimelike#
- Bug in - DataFrameconstructor unnecessarily copying non-datetimelike 2D object arrays (GH39272)
- Bug in - to_datetime()with- formatand- pandas.NAwas raising- ValueError(GH42957)
- to_datetime()would silently swap- MM/DD/YYYYand- DD/MM/YYYYformats if the given- dayfirstoption could not be respected - now, a warning is raised in the case of delimited date strings (e.g.- 31-12-2012) (GH12585)
- Bug in - date_range()and- bdate_range()do not return right bound when- start=- endand set is closed on one side (GH43394)
- Bug in inplace addition and subtraction of - DatetimeIndexor- TimedeltaIndexwith- DatetimeArrayor- TimedeltaArray(GH43904)
- Bug in calling - np.isnan,- np.isfinite, or- np.isinfon a timezone-aware- DatetimeIndexincorrectly raising- TypeError(GH43917)
- Bug in constructing a - Seriesfrom datetime-like strings with mixed timezones incorrectly partially-inferring datetime values (GH40111)
- Bug in addition of a - Tickobject and a- np.timedelta64object incorrectly raising instead of returning- Timedelta(GH44474)
- np.maximum.reduceand- np.minimum.reducenow correctly return- Timestampand- Timedeltaobjects when operating on- Series,- DataFrame, or- Indexwith- datetime64[ns]or- timedelta64[ns]dtype (GH43923)
- Bug in adding a - np.timedelta64object to a- BusinessDayor- CustomBusinessDayobject incorrectly raising (GH44532)
- Bug in - Index.insert()for inserting- np.datetime64,- np.timedelta64or- tupleinto- Indexwith- dtype='object'with negative loc adding- Noneand replacing existing value (GH44509)
- Bug in - Timestamp.to_pydatetime()failing to retain the- foldattribute (GH45087)
- Bug in - Series.mode()with- DatetimeTZDtypeincorrectly returning timezone-naive and- PeriodDtypeincorrectly raising (GH41927)
- Fixed regression in - reindex()raising an error when using an incompatible fill value with a datetime-like dtype (or not raising a deprecation warning for using a- datetime.dateas fill value) (GH42921)
- Bug in - DateOffset`addition with- Timestampwhere- offset.nanosecondswould not be included in the result (GH43968, GH36589)
- Bug in - Timestamp.fromtimestamp()not supporting the- tzargument (GH45083)
- Bug in - DataFrameconstruction from dict of- Serieswith mismatched index dtypes sometimes raising depending on the ordering of the passed dict (GH44091)
- Bug in - Timestamphashing during some DST transitions caused a segmentation fault (GH33931 and GH40817)
Timedelta#
- Bug in division of all- - NaT- TimeDeltaIndex,- Seriesor- DataFramecolumn with object-dtype array like of numbers failing to infer the result as timedelta64-dtype (GH39750)
- Bug in floor division of - timedelta64[ns]data with a scalar returning garbage values (GH44466)
- Bug in - Timedeltanow properly taking into account any nanoseconds contribution of any kwarg (GH43764, GH45227)
Time Zones#
- Bug in - to_datetime()with- infer_datetime_format=Truefailing to parse zero UTC offset (- Z) correctly (GH41047)
- Bug in - Series.dt.tz_convert()resetting index in a- Serieswith- CategoricalIndex(GH43080)
- Bug in - Timestampand- DatetimeIndexincorrectly raising a- TypeErrorwhen subtracting two timezone-aware objects with mismatched timezones (GH31793)
Numeric#
- Bug in floor-dividing a list or tuple of integers by a - Seriesincorrectly raising (GH44674)
- Bug in - DataFrame.rank()raising- ValueErrorwith- objectcolumns and- method="first"(GH41931)
- Bug in - DataFrame.rank()treating missing values and extreme values as equal (for example- np.nanand- np.inf), causing incorrect results when- na_option="bottom"or- na_option="topused (GH41931)
- Bug in - numexprengine still being used when the option- compute.use_numexpris set to- False(GH32556)
- Bug in - DataFramearithmetic ops with a subclass whose- _constructor()attribute is a callable other than the subclass itself (GH43201)
- Bug in arithmetic operations involving - RangeIndexwhere the result would have the incorrect- name(GH43962)
- Bug in arithmetic operations involving - Serieswhere the result could have the incorrect- namewhen the operands having matching NA or matching tuple names (GH44459)
- Bug in division with - IntegerDtypeor- BooleanDtypearray and NA scalar incorrectly raising (GH44685)
- Bug in multiplying a - Serieswith- FloatingDtypewith a timedelta-like scalar incorrectly raising (GH44772)
Conversion#
- Bug in - UInt64Indexconstructor when passing a list containing both positive integers small enough to cast to int64 and integers too large to hold in int64 (GH42201)
- Bug in - Seriesconstructor returning 0 for missing values with dtype- int64and- Falsefor dtype- bool(GH43017, GH43018)
- Bug in constructing a - DataFramefrom a- PandasArraycontaining- Seriesobjects behaving differently than an equivalent- np.ndarray(GH43986)
- Bug in - IntegerDtypenot allowing coercion from string dtype (GH25472)
- Bug in - to_datetime()with- arg:xr.DataArrayand- unit="ns"specified raises- TypeError(GH44053)
- Bug in - DataFrame.convert_dtypes()not returning the correct type when a subclass does not overload- _constructor_sliced()(GH43201)
- Bug in - DataFrame.astype()not propagating- attrsfrom the original- DataFrame(GH44414)
- Bug in - DataFrame.convert_dtypes()result losing- columns.names(GH41435)
- Bug in constructing a - IntegerArrayfrom pyarrow data failing to validate dtypes (GH44891)
- Bug in - Series.astype()not allowing converting from a- PeriodDtypeto- datetime64dtype, inconsistent with the- PeriodIndexbehavior (GH45038)
Strings#
- Bug in checking for - string[pyarrow]dtype incorrectly raising an- ImportErrorwhen pyarrow is not installed (GH44276)
Interval#
- Bug in - Series.where()with- IntervalDtypeincorrectly raising when the- wherecall should not replace anything (GH44181)
Indexing#
- Bug in - Series.rename()with- MultiIndexand- levelis provided (GH43659)
- Bug in - DataFrame.truncate()and- Series.truncate()when the object’s- Indexhas a length greater than one but only one unique value (GH42365)
- Bug in - Series.loc()and- DataFrame.loc()with a- MultiIndexwhen indexing with a tuple in which one of the levels is also a tuple (GH27591)
- Bug in - Series.loc()with a- MultiIndexwhose first level contains only- np.nanvalues (GH42055)
- Bug in indexing on a - Seriesor- DataFramewith a- DatetimeIndexwhen passing a string, the return type depended on whether the index was monotonic (GH24892)
- Bug in indexing on a - MultiIndexfailing to drop scalar levels when the indexer is a tuple containing a datetime-like string (GH42476)
- Bug in - DataFrame.sort_values()and- Series.sort_values()when passing an ascending value, failed to raise or incorrectly raising- ValueError(GH41634)
- Bug in updating values of - pandas.Seriesusing boolean index, created by using- pandas.DataFrame.pop()(GH42530)
- Bug in - Index.get_indexer_non_unique()when index contains multiple- np.nan(GH35392)
- Bug in - DataFrame.query()did not handle the degree sign in a backticked column name, such as `Temp(°C)`, used in an expression to query a- DataFrame(GH42826)
- Bug in - DataFrame.drop()where the error message did not show missing labels with commas when raising- KeyError(GH42881)
- Bug in - DataFrame.query()where method calls in query strings led to errors when the- numexprpackage was installed (GH22435)
- Bug in - DataFrame.nlargest()and- Series.nlargest()where sorted result did not count indexes containing- np.nan(GH28984)
- Bug in indexing on a non-unique object-dtype - Indexwith an NA scalar (e.g.- np.nan) (GH43711)
- Bug in - DataFrame.__setitem__()incorrectly writing into an existing column’s array rather than setting a new array when the new dtype and the old dtype match (GH43406)
- Bug in setting floating-dtype values into a - Serieswith integer dtype failing to set inplace when those values can be losslessly converted to integers (GH44316)
- Bug in - Series.__setitem__()with object dtype when setting an array with matching size and dtype=’datetime64[ns]’ or dtype=’timedelta64[ns]’ incorrectly converting the datetime/timedeltas to integers (GH43868)
- Bug in - DataFrame.sort_index()where- ignore_index=Truewas not being respected when the index was already sorted (GH43591)
- Bug in - Index.get_indexer_non_unique()when index contains multiple- np.datetime64("NaT")and- np.timedelta64("NaT")(GH43869)
- Bug in setting a scalar - Intervalvalue into a- Serieswith- IntervalDtypewhen the scalar’s sides are floats and the values’ sides are integers (GH44201)
- Bug when setting string-backed - Categoricalvalues that can be parsed to datetimes into a- DatetimeArrayor- Seriesor- DataFramecolumn backed by- DatetimeArrayfailing to parse these strings (GH44236)
- Bug in - Series.__setitem__()with an integer dtype other than- int64setting with a- rangeobject unnecessarily upcasting to- int64(GH44261)
- Bug in - Series.__setitem__()with a boolean mask indexer setting a listlike value of length 1 incorrectly broadcasting that value (GH44265)
- Bug in - Series.reset_index()not ignoring- nameargument when- dropand- inplaceare set to- True(GH44575)
- Bug in - DataFrame.loc.__setitem__()and- DataFrame.iloc.__setitem__()with mixed dtypes sometimes failing to operate in-place (GH44345)
- Bug in - DataFrame.loc.__getitem__()incorrectly raising- KeyErrorwhen selecting a single column with a boolean key (GH44322).
- Bug in setting - DataFrame.iloc()with a single- ExtensionDtypecolumn and setting 2D values e.g.- df.iloc[:] = df.valuesincorrectly raising (GH44514)
- Bug in setting values with - DataFrame.iloc()with a single- ExtensionDtypecolumn and a tuple of arrays as the indexer (GH44703)
- Bug in indexing on columns with - locor- ilocusing a slice with a negative step with- ExtensionDtypecolumns incorrectly raising (GH44551)
- Bug in - DataFrame.loc.__setitem__()changing dtype when indexer was completely- False(GH37550)
- Bug in - IntervalIndex.get_indexer_non_unique()returning boolean mask instead of array of integers for a non unique and non monotonic index (GH44084)
- Bug in - IntervalIndex.get_indexer_non_unique()not handling targets of- dtype‘object’ with NaNs correctly (GH44482)
- Fixed regression where a single column - np.matrixwas no longer coerced to a 1d- np.ndarraywhen added to a- DataFrame(GH42376)
- Bug in - Series.__getitem__()with a- CategoricalIndexof integers treating lists of integers as positional indexers, inconsistent with the behavior with a single scalar integer (GH15470, GH14865)
- Bug in - Series.__setitem__()when setting floats or integers into integer-dtype- Seriesfailing to upcast when necessary to retain precision (GH45121)
- Bug in - DataFrame.iloc.__setitem__()ignores axis argument (GH45032)
Missing#
- Bug in - DataFrame.fillna()with- limitand no- methodignores- axis='columns'or- axis = 1(GH40989, GH17399)
- Bug in - DataFrame.fillna()not replacing missing values when using a dict-like- valueand duplicate column names (GH43476)
- Bug in constructing a - DataFramewith a dictionary- np.datetime64as a value and- dtype='timedelta64[ns]', or vice-versa, incorrectly casting instead of raising (GH44428)
- Bug in - Series.interpolate()and- DataFrame.interpolate()with- inplace=Truenot writing to the underlying array(s) in-place (GH44749)
- Bug in - Index.fillna()incorrectly returning an unfilled- Indexwhen NA values are present and- downcastargument is specified. This now raises- NotImplementedErrorinstead; do not pass- downcastargument (GH44873)
- Bug in - DataFrame.dropna()changing- Indexeven if no entries were dropped (GH41965)
- Bug in - Series.fillna()with an object-dtype incorrectly ignoring- downcast="infer"(GH44241)
MultiIndex#
- Bug in - MultiIndex.get_loc()where the first level is a- DatetimeIndexand a string key is passed (GH42465)
- Bug in - MultiIndex.reindex()when passing a- levelthat corresponds to an- ExtensionDtypelevel (GH42043)
- Bug in - MultiIndex.get_loc()raising- TypeErrorinstead of- KeyErroron nested tuple (GH42440)
- Bug in - MultiIndex.union()setting wrong- sortordercausing errors in subsequent indexing operations with slices (GH44752)
- Bug in - MultiIndex.putmask()where the other value was also a- MultiIndex(GH43212)
- Bug in - MultiIndex.dtypes()duplicate level names returned only one dtype per name (GH45174)
I/O#
- Bug in - read_excel()attempting to read chart sheets from .xlsx files (GH41448)
- Bug in - json_normalize()where- errors=ignorecould fail to ignore missing values of- metawhen- record_pathhas a length greater than one (GH41876)
- Bug in - read_csv()with multi-header input and arguments referencing column names as tuples (GH42446)
- Bug in - read_fwf(), where difference in lengths of- colspecsand- nameswas not raising- ValueError(GH40830)
- Bug in - Series.to_json()and- DataFrame.to_json()where some attributes were skipped when serializing plain Python objects to JSON (GH42768, GH33043)
- Column headers are dropped when constructing a - DataFramefrom a sqlalchemy’s- Rowobject (GH40682)
- Bug in unpickling an - Indexwith object dtype incorrectly inferring numeric dtypes (GH43188)
- Bug in - read_csv()where reading multi-header input with unequal lengths incorrectly raised- IndexError(GH43102)
- Bug in - read_csv()raising- ParserErrorwhen reading file in chunks and some chunk blocks have fewer columns than header for- engine="c"(GH21211)
- Bug in - read_csv(), changed exception class when expecting a file path name or file-like object from- OSErrorto- TypeError(GH43366)
- Bug in - read_csv()and- read_fwf()ignoring all- skiprowsexcept first when- nrowsis specified for- engine='python'(GH44021, GH10261)
- Bug in - read_csv()keeping the original column in object format when- keep_date_col=Trueis set (GH13378)
- Bug in - read_json()not handling non-numpy dtypes correctly (especially- category) (GH21892, GH33205)
- Bug in - json_normalize()where multi-character- sepparameter is incorrectly prefixed to every key (GH43831)
- Bug in - json_normalize()where reading data with missing multi-level metadata would not respect- errors="ignore"(GH44312)
- Bug in - read_csv()used second row to guess implicit index if- headerwas set to- Nonefor- engine="python"(GH22144)
- Bug in - read_csv()not recognizing bad lines when- nameswere given for- engine="c"(GH22144)
- Bug in - read_csv()with- float_precision="round_trip"which did not skip initial/trailing whitespace (GH43713)
- Bug when Python is built without the lzma module: a warning was raised at the pandas import time, even if the lzma capability isn’t used (GH43495) 
- Bug in - read_csv()not applying dtype for- index_col(GH9435)
- Bug in dumping/loading a - DataFramewith- yaml.dump(frame)(GH42748)
- Bug in - read_csv()raising- ValueErrorwhen- nameswas longer than- headerbut equal to data rows for- engine="python"(GH38453)
- Bug in - ExcelWriter, where- engine_kwargswere not passed through to all engines (GH43442)
- Bug in - read_csv()raising- ValueErrorwhen- parse_dateswas used with- MultiIndexcolumns (GH8991)
- Bug in - read_csv()not raising an- ValueErrorwhen- \nwas specified as- delimiteror- sepwhich conflicts with- lineterminator(GH43528)
- Bug in - to_csv()converting datetimes in categorical- Seriesto integers (GH40754)
- Bug in - read_csv()converting columns to numeric after date parsing failed (GH11019)
- Bug in - read_csv()not replacing- NaNvalues with- np.nanbefore attempting date conversion (GH26203)
- Bug in - read_csv()raising- AttributeErrorwhen attempting to read a .csv file and infer index column dtype from an nullable integer type (GH44079)
- Bug in - to_csv()always coercing datetime columns with different formats to the same format (GH21734)
- DataFrame.to_csv()and- Series.to_csv()with- compressionset to- 'zip'no longer create a zip file containing a file ending with “.zip”. Instead, they try to infer the inner file name more smartly (GH39465)
- Bug in - read_csv()where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (GH42808, GH34120)
- Bug in - to_xml()raising error for- pd.NAwith extension array dtype (GH43903)
- Bug in - read_csv()when passing simultaneously a parser in- date_parserand- parse_dates=False, the parsing was still called (GH44366)
- Bug in - read_csv()not setting name of- MultiIndexcolumns correctly when- index_colis not the first column (GH38549)
- Bug in - read_csv()silently ignoring errors when failing to create a memory-mapped file (GH44766)
- Bug in - read_csv()when passing a- tempfile.SpooledTemporaryFileopened in binary mode (GH44748)
- Bug in - read_json()raising- ValueErrorwhen attempting to parse json strings containing “://” (GH36271)
- Bug in - read_csv()when- engine="c"and- encoding_errors=Nonewhich caused a segfault (GH45180)
- Bug in - read_csv()an invalid value of- usecolsleading to an unclosed file handle (GH45384)
- Bug in - DataFrame.to_json()fix memory leak (GH43877)
Period#
- Bug in adding a - Periodobject to a- np.timedelta64object incorrectly raising- TypeError(GH44182)
- Bug in - PeriodIndex.to_timestamp()when the index has- freq="B"inferring- freq="D"for its result instead of- freq="B"(GH44105)
- Bug in - Periodconstructor incorrectly allowing- np.timedelta64("NaT")(GH44507)
- Bug in - PeriodIndex.to_timestamp()giving incorrect values for indexes with non-contiguous data (GH44100)
- Bug in - Series.where()with- PeriodDtypeincorrectly raising when the- wherecall should not replace anything (GH45135)
Plotting#
- When given non-numeric data, - DataFrame.boxplot()now raises a- ValueErrorrather than a cryptic- KeyErroror- ZeroDivisionError, in line with other plotting functions like- DataFrame.hist()(GH43480)
Groupby/resample/rolling#
- Bug in - SeriesGroupBy.apply()where passing an unrecognized string argument failed to raise- TypeErrorwhen the underlying- Seriesis empty (GH42021)
- Bug in - Series.rolling.apply(),- DataFrame.rolling.apply(),- Series.expanding.apply()and- DataFrame.expanding.apply()with- engine="numba"where- *argswere being cached with the user passed function (GH42287)
- Bug in - GroupBy.max()and- GroupBy.min()with nullable integer dtypes losing precision (GH41743)
- Bug in - DataFrame.groupby.rolling.var()would calculate the rolling variance only on the first group (GH42442)
- Bug in - GroupBy.shift()that would return the grouping columns if- fill_valuewas not- None(GH41556)
- Bug in - SeriesGroupBy.nlargest()and- SeriesGroupBy.nsmallest()would have an inconsistent index when the input- Serieswas sorted and- nwas greater than or equal to all group sizes (GH15272, GH16345, GH29129)
- Bug in - pandas.DataFrame.ewm(), where non-float64 dtypes were silently failing (GH42452)
- Bug in - pandas.DataFrame.rolling()operation along rows (- axis=1) incorrectly omits columns containing- float16and- float32(GH41779)
- Bug in - Resampler.aggregate()did not allow the use of Named Aggregation (GH32803)
- Bug in - Series.rolling()when the- Series- dtypewas- Int64(GH43016)
- Bug in - DataFrame.rolling.corr()when the- DataFramecolumns was a- MultiIndex(GH21157)
- Bug in - DataFrame.groupby.rolling()when specifying- onand calling- __getitem__would subsequently return incorrect results (GH43355)
- Bug in - GroupBy.apply()with time-based- Grouperobjects incorrectly raising- ValueErrorin corner cases where the grouping vector contains a- NaT(GH43500, GH43515)
- Bug in - GroupBy.mean()failing with- complexdtype (GH43701)
- Bug in - Series.rolling()and- DataFrame.rolling()not calculating window bounds correctly for the first row when- center=Trueand index is decreasing (GH43927)
- Bug in - Series.rolling()and- DataFrame.rolling()for centered datetimelike windows with uneven nanosecond (GH43997)
- Bug in - GroupBy.mean()raising- KeyErrorwhen column was selected at least twice (GH44924)
- Bug in - GroupBy.nth()failing on- axis=1(GH43926)
- Bug in - Series.rolling()and- DataFrame.rolling()not respecting right bound on centered datetime-like windows, if the index contain duplicates (GH3944)
- Bug in - Series.rolling()and- DataFrame.rolling()when using a- pandas.api.indexers.BaseIndexersubclass that returned unequal start and end arrays would segfault instead of raising a- ValueError(GH44470)
- Bug in - Groupby.nunique()not respecting- observed=Truefor- categoricalgrouping columns (GH45128)
- Bug in - GroupBy.head()and- GroupBy.tail()not dropping groups with- NaNwhen- dropna=True(GH45089)
- Bug in - GroupBy.__iter__()after selecting a subset of columns in a- GroupByobject, which returned all columns instead of the chosen subset (GH44821)
- Bug in - Groupby.rolling()when non-monotonic data passed, fails to correctly raise- ValueError(GH43909)
- Bug where grouping by a - Seriesthat has a- categoricaldata type and length unequal to the axis of grouping raised- ValueError(GH44179)
Reshaping#
- Improved error message when creating a - DataFramecolumn from a multi-dimensional- numpy.ndarray(GH42463)
- Bug in - concat()creating- MultiIndexwith duplicate level entries when concatenating a- DataFramewith duplicates in- Indexand multiple keys (GH42651)
- Bug in - pandas.cut()on- Serieswith duplicate indices and non-exact- pandas.CategoricalIndex()(GH42185, GH42425)
- Bug in - DataFrame.append()failing to retain dtypes when appended columns do not match (GH43392)
- Bug in - concat()of- booland- booleandtypes resulting in- objectdtype instead of- booleandtype (GH42800)
- Bug in - crosstab()when inputs are categorical- Series, there are categories that are not present in one or both of the- Series, and- margins=True. Previously the margin value for missing categories was- NaN. It is now correctly reported as 0 (GH43505)
- Bug in - concat()would fail when the- objsargument all had the same index and the- keysargument contained duplicates (GH43595)
- Bug in - merge()with- MultiIndexas column index for the- onargument returning an error when assigning a column internally (GH43734)
- Bug in - crosstab()would fail when inputs are lists or tuples (GH44076)
- Bug in - DataFrame.append()failing to retain- index.namewhen appending a list of- Seriesobjects (GH44109)
- Fixed metadata propagation in - Dataframe.apply()method, consequently fixing the same issue for- Dataframe.transform(),- Dataframe.nunique()and- Dataframe.mode()(GH28283)
- Bug in - concat()casting levels of- MultiIndexto float if all levels only consist of missing values (GH44900)
- Bug in - DataFrame.stack()with- ExtensionDtypecolumns incorrectly raising (GH43561)
- Bug in - merge()raising- KeyErrorwhen joining over differently named indexes with on keywords (GH45094)
- Bug in - Series.unstack()with object doing unwanted type inference on resulting columns (GH44595)
- Bug in - MultiIndex.join()with overlapping- IntervalIndexlevels (GH44096)
- Bug in - DataFrame.replace()and- Series.replace()results is different- dtypebased on- regexparameter (GH44864)
- Bug in - DataFrame.pivot()with- index=Nonewhen the- DataFrameindex was a- MultiIndex(GH23955)
Sparse#
- Bug in - DataFrame.sparse.to_coo()raising- AttributeErrorwhen column names are not unique (GH29564)
- Bug in - SparseArray.max()and- SparseArray.min()raising- ValueErrorfor arrays with 0 non-null elements (GH43527)
- Bug in - DataFrame.sparse.to_coo()silently converting non-zero fill values to zero (GH24817)
- Bug in - SparseArraycomparison methods with an array-like operand of mismatched length raising- AssertionErroror unclear- ValueErrordepending on the input (GH43863)
- Bug in - SparseArrayarithmetic methods- floordivand- modbehaviors when dividing by zero not matching the non-sparse- Seriesbehavior (GH38172)
- Bug in - SparseArrayunary methods as well as- SparseArray.isna()doesn’t recalculate indexes (GH44955)
ExtensionArray#
- NumPy ufuncs - np.abs,- np.positive,- np.negativenow correctly preserve dtype when called on ExtensionArrays that implement- __abs__, __pos__, __neg__, respectively. In particular this is fixed for- TimedeltaArray(GH43899, GH23316)
- NumPy ufuncs - np.minimum.reduce- np.maximum.reduce,- np.add.reduce, and- np.prod.reducenow work correctly instead of raising- NotImplementedErroron- Serieswith- IntegerDtypeor- FloatDtype(GH43923, GH44793)
- NumPy ufuncs with - outkeyword are now supported by arrays with- IntegerDtypeand- FloatingDtype(GH45122)
- Avoid raising - PerformanceWarningabout fragmented- DataFramewhen using many columns with an extension dtype (GH44098)
- Bug in - IntegerArrayand- FloatingArrayconstruction incorrectly coercing mismatched NA values (e.g.- np.timedelta64("NaT")) to numeric NA (GH44514)
- Bug in - BooleanArray.__eq__()and- BooleanArray.__ne__()raising- TypeErroron comparison with an incompatible type (like a string). This caused- DataFrame.replace()to sometimes raise a- TypeErrorif a nullable boolean column was included (GH44499)
- Bug in - array()incorrectly raising when passed a- ndarraywith- float16dtype (GH44715)
- Bug in calling - np.sqrton- BooleanArrayreturning a malformed- FloatingArray(GH44715)
- Bug in - Series.where()with- ExtensionDtypewhen- otheris a NA scalar incompatible with the- Seriesdtype (e.g.- NaTwith a numeric dtype) incorrectly casting to a compatible NA value (GH44697)
- Bug in - Series.replace()where explicitly passing- value=Noneis treated as if no- valuewas passed, and- Nonenot being in the result (GH36984, GH19998)
- Bug in - Series.replace()with unwanted downcasting being done in no-op replacements (GH44498)
- Bug in - Series.replace()with- FloatDtype,- string[python], or- string[pyarrow]dtype not being preserved when possible (GH33484, GH40732, GH31644, GH41215, GH25438)
Styler#
- Bug in - Stylerwhere the- uuidat initialization maintained a floating underscore (GH43037)
- Bug in - Styler.to_html()where the- Stylerobject was updated if the- to_htmlmethod was called with some args (GH43034)
- Bug in - Styler.copy()where- uuidwas not previously copied (GH40675)
- Bug in - Styler.apply()where functions which returned- Seriesobjects were not correctly handled in terms of aligning their index labels (GH13657, GH42014)
- Bug when rendering an empty - DataFramewith a named- Index(GH43305)
- Bug when rendering a single level - MultiIndex(GH43383)
- Bug when combining non-sparse rendering and - Styler.hide_columns()or- Styler.hide_index()(GH43464)
- Bug setting a table style when using multiple selectors in - Styler(GH44011)
- Bugs where row trimming and column trimming failed to reflect hidden rows (GH43703, GH44247) 
Other#
- Bug in - DataFrame.astype()with non-unique columns and a- Series- dtypeargument (GH44417)
- Bug in - CustomBusinessMonthBegin.__add__()(- CustomBusinessMonthEnd.__add__()) not applying the extra- offsetparameter when beginning (end) of the target month is already a business day (GH41356)
- Bug in - RangeIndex.union()with another- RangeIndexwith matching (even)- stepand starts differing by strictly less than- step / 2(GH44019)
- Bug in - RangeIndex.difference()with- sort=Noneand- step<0failing to sort (GH44085)
- Bug in - Series.replace()and- DataFrame.replace()with- value=Noneand ExtensionDtypes (GH44270, GH37899)
- Bug in - FloatingArray.equals()failing to consider two arrays equal if they contain- np.nanvalues (GH44382)
- Bug in - DataFrame.shift()with- axis=1and- ExtensionDtypecolumns incorrectly raising when an incompatible- fill_valueis passed (GH44564)
- Bug in - DataFrame.shift()with- axis=1and- periodslarger than- len(frame.columns)producing an invalid- DataFrame(GH44978)
- Bug in - DataFrame.diff()when passing a NumPy integer object instead of an- intobject (GH44572)
- Bug in - Series.replace()raising- ValueErrorwhen using- regex=Truewith a- Seriescontaining- np.nanvalues (GH43344)
- Bug in - DataFrame.to_records()where an incorrect- nwas used when missing names were replaced by- level_n(GH44818)
- Bug in - DataFrame.eval()where- resolversargument was overriding the default resolvers (GH34966)
- Series.__repr__()and- DataFrame.__repr__()no longer replace all null-values in indexes with “NaN” but use their real string-representations. “NaN” is used only for- float("nan")(GH45263)
Contributors#
A total of 275 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Abhishek R 
- Albert Villanova del Moral 
- Alessandro Bisiani + 
- Alex Lim 
- Alex-Gregory-1 + 
- Alexander Gorodetsky 
- Alexander Regueiro + 
- Alexey Györi 
- Alexis Mignon 
- Aleš Erjavec 
- Ali McMaster 
- Alibi + 
- Andrei Batomunkuev + 
- Andrew Eckart + 
- Andrew Hawyrluk 
- Andrew Wood 
- Anton Lodder + 
- Armin Berres + 
- Arushi Sharma + 
- Benedikt Heidrich + 
- Beni Bienz + 
- Benoît Vinot 
- Bert Palm + 
- Boris Rumyantsev + 
- Brian Hulette 
- Brock 
- Bruno Costa + 
- Bryan Racic + 
- Caleb Epstein 
- Calvin Ho 
- ChristofKaufmann + 
- Christopher Yeh + 
- Chuliang Xiao + 
- ClaudiaSilver + 
- DSM 
- Daniel Coll + 
- Daniel Schmidt + 
- Dare Adewumi 
- David + 
- David Sanders + 
- David Wales + 
- Derzan Chiang + 
- DeviousLab + 
- Dhruv B Shetty + 
- Digres45 + 
- Dominik Kutra + 
- Drew Levitt + 
- DriesS 
- EdAbati 
- Elle 
- Elliot Rampono 
- Endre Mark Borza 
- Erfan Nariman 
- Evgeny Naumov + 
- Ewout ter Hoeven + 
- Fangchen Li 
- Felix Divo 
- Felix Dulys + 
- Francesco Andreuzzi + 
- Francois Dion + 
- Frans Larsson + 
- Fred Reiss 
- GYvan 
- Gabriel Di Pardi Arruda + 
- Gesa Stupperich 
- Giacomo Caria + 
- Greg Siano + 
- Griffin Ansel 
- Hiroaki Ogasawara + 
- Horace + 
- Horace Lai + 
- Irv Lustig 
- Isaac Virshup 
- JHM Darbyshire (MBP) 
- JHM Darbyshire (iMac) 
- JHM Darbyshire + 
- Jack Liu 
- Jacob Skwirsk + 
- Jaime Di Cristina + 
- James Holcombe + 
- Janosh Riebesell + 
- Jarrod Millman 
- Jason Bian + 
- Jeff Reback 
- Jernej Makovsek + 
- Jim Bradley + 
- Joel Gibson + 
- Joeperdefloep + 
- Johannes Mueller + 
- John S Bogaardt + 
- John Zangwill + 
- Jon Haitz Legarreta Gorroño + 
- Jon Wiggins + 
- Jonas Haag + 
- Joris Van den Bossche 
- Josh Friedlander 
- José Duarte + 
- Julian Fleischer + 
- Julien de la Bruère-T 
- Justin McOmie 
- Kadatatlu Kishore + 
- Kaiqi Dong 
- Kashif Khan + 
- Kavya9986 + 
- Kendall + 
- Kevin Sheppard 
- Kiley Hewitt 
- Koen Roelofs + 
- Krishna Chivukula 
- KrishnaSai2020 
- Leonardo Freua + 
- Leonardus Chen 
- Liang-Chi Hsieh + 
- Loic Diridollou + 
- Lorenzo Maffioli + 
- Luke Manley + 
- LunarLanding + 
- Marc Garcia 
- Marcel Bittar + 
- Marcel Gerber + 
- Marco Edward Gorelli 
- Marco Gorelli 
- MarcoGorelli 
- Marvin + 
- Mateusz Piotrowski + 
- Mathias Hauser + 
- Matt Richards + 
- Matthew Davis + 
- Matthew Roeschke 
- Matthew Zeitlin 
- Matthias Bussonnier 
- Matti Picus 
- Mauro Silberberg + 
- Maxim Ivanov 
- Maximilian Carr + 
- MeeseeksMachine 
- Michael Sarrazin + 
- Michael Wang + 
- Michał Górny + 
- Mike Phung + 
- Mike Taves + 
- Mohamad Hussein Rkein + 
- NJOKU OKECHUKWU VALENTINE + 
- Neal McBurnett + 
- Nick Anderson + 
- Nikita Sobolev + 
- Olivier Cavadenti + 
- PApostol + 
- Pandas Development Team 
- Patrick Hoefler 
- Peter 
- Peter Tillmann + 
- Prabha Arivalagan + 
- Pradyumna Rahul 
- Prerana Chakraborty 
- Prithvijit + 
- Rahul Gaikwad + 
- Ray Bell 
- Ricardo Martins + 
- Richard Shadrach 
- Robbert-jan ‘t Hoen + 
- Robert Voyer + 
- Robin Raymond + 
- Rohan Sharma + 
- Rohan Sirohia + 
- Roman Yurchak 
- Ruan Pretorius + 
- Sam James + 
- Scott Talbert 
- Shashwat Sharma + 
- Sheogorath27 + 
- Shiv Gupta 
- Shoham Debnath 
- Simon Hawkins 
- Soumya + 
- Stan West + 
- Stefanie Molin + 
- Stefano Alberto Russo + 
- Stephan Heßelmann 
- Stephen 
- Suyash Gupta + 
- Sven 
- Swanand01 + 
- Sylvain Marié + 
- TLouf 
- Tania Allard + 
- Terji Petersen 
- TheDerivator + 
- Thomas Dickson 
- Thomas Kastl + 
- Thomas Kluyver 
- Thomas Li 
- Thomas Smith 
- Tim Swast 
- Tim Tran + 
- Tobias McNulty + 
- Tobias Pitters 
- Tomoki Nakagawa + 
- Tony Hirst + 
- Torsten Wörtwein 
- V.I. Wood + 
- Vaibhav K + 
- Valentin Oliver Loftsson + 
- Varun Shrivastava + 
- Vivek Thazhathattil + 
- Vyom Pathak 
- Wenjun Si 
- William Andrea + 
- William Bradley + 
- Wojciech Sadowski + 
- Yao-Ching Huang + 
- Yash Gupta + 
- Yiannis Hadjicharalambous + 
- Yoshiki Vázquez Baeza 
- Yuanhao Geng 
- Yury Mikhaylov 
- Yvan Gatete + 
- Yves Delley + 
- Zach Rait 
- Zbyszek Królikowski + 
- Zero + 
- Zheyuan 
- Zhiyi Wu + 
- aiudirog 
- ali sayyah + 
- aneesh98 + 
- aptalca 
- arw2019 + 
- attack68 
- brendandrury + 
- bubblingoak + 
- calvinsomething + 
- claws + 
- deponovo + 
- dicristina 
- el-g-1 + 
- evensure + 
- fotino21 + 
- fshi01 + 
- gfkang + 
- github-actions[bot] 
- i-aki-y 
- jbrockmendel 
- jreback 
- juliandwain + 
- jxb4892 + 
- kendall smith + 
- lmcindewar + 
- lrepiton 
- maximilianaccardo + 
- michal-gh 
- neelmraman 
- partev 
- phofl + 
- pratyushsharan + 
- quantumalaviya + 
- rafael + 
- realead 
- rocabrera + 
- rosagold 
- saehuihwang + 
- salomondush + 
- shubham11941140 + 
- srinivasan + 
- stphnlyd 
- suoniq 
- trevorkask + 
- tushushu 
- tyuyoshi + 
- usersblock + 
- vernetya + 
- vrserpa + 
- willie3838 + 
- zeitlinv + 
- zhangxiaoxing +