pandas arrays, scalars, and data types#
For most data types, pandas uses NumPy arrays as the concrete
objects contained with a Index, Series, or
DataFrame.
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
| Kind of Data | pandas Data Type | Scalar | Array | 
|---|---|---|---|
| TZ-aware datetime | |||
| Timedeltas | (none) | ||
| Period (time spans) | |||
| Intervals | |||
| Nullable Integer | 
 | (none) | |
| Categorical | (none) | ||
| Sparse | (none) | ||
| Strings | |||
| Boolean (with NA) | 
pandas and third-party libraries can extend NumPy’s type system (see Extension types).
The top-level array() method can be used to create a new array, which may be
stored in a Series, Index, or as a column in a DataFrame.
| 
 | Create an array. | 
Datetime data#
NumPy cannot natively represent timezone-aware datetimes. pandas supports this
with the arrays.DatetimeArray extension array, which can hold timezone-naive
or timezone-aware values.
Timestamp, a subclass of datetime.datetime, is pandas’
scalar type for timezone-naive or timezone-aware datetime data.
| 
 | Pandas replacement for python datetime.datetime object. | 
Properties#
| Return numpy datetime64 format in nanoseconds. | |
| Return day of the week. | |
| Return day of the week. | |
| Return the day of the year. | |
| Return the day of the year. | |
| Return the number of days in the month. | |
| Return the number of days in the month. | |
| Return True if year is a leap year. | |
| Return True if date is last day of month. | |
| Return True if date is first day of month. | |
| Return True if date is last day of the quarter. | |
| Return True if date is first day of the quarter. | |
| Return True if date is last day of the year. | |
| Return True if date is first day of the year. | |
| Return the quarter of the year. | |
| Alias for tzinfo. | |
| Return the week number of the year. | |
| Return the week number of the year. | |
Methods#
| Convert timezone-aware Timestamp to another time zone. | |
| 
 | Return a new Timestamp ceiled to this resolution. | 
| 
 | Combine date, time into datetime with same date and time fields. | 
| Return ctime() style string. | |
| Return date object with same year, month and day. | |
| Return the day name of the Timestamp with specified locale. | |
| Return self.tzinfo.dst(self). | |
| 
 | Return a new Timestamp floored to this resolution. | 
| Return the total number of days in the month. | |
| 
 | Passed an ordinal, translate and convert to a ts. | 
| Transform timestamp[, tz] to tz's local time from POSIX timestamp. | |
| Return a 3-tuple containing ISO year, week number, and weekday. | |
| Return the time formatted according to ISO 8610. | |
| Return the day of the week represented by the date. | |
| Return the month name of the Timestamp with specified locale. | |
| Normalize Timestamp to midnight, preserving tz information. | |
| 
 | Return new Timestamp object representing current time local to tz. | 
| 
 | Implements datetime.replace, handles nanoseconds. | 
| 
 | Round the Timestamp to the specified resolution. | 
| 
 | Return a string representing the given POSIX timestamp controlled by an explicit format string. | 
| 
 | Function is not implemented. | 
| Return time object with same time but with tzinfo=None. | |
| Return POSIX timestamp as float. | |
| Return time tuple, compatible with time.localtime(). | |
| Return time object with same time and tzinfo. | |
| Return a numpy.datetime64 object with 'ns' precision. | |
| Convert the Timestamp to a NumPy datetime64. | |
| Convert TimeStamp to a Julian Date. | |
| Return an period of which this timestamp is an observation. | |
| Convert a Timestamp object to a native Python datetime object. | |
| 
 | Return the current time in the local timezone. | 
| Return proleptic Gregorian ordinal. | |
| Convert timezone-aware Timestamp to another time zone. | |
| 
 | Convert naive Timestamp to local time zone, or remove timezone from timezone-aware Timestamp. | 
| Return self.tzinfo.tzname(self). | |
| Construct a naive UTC datetime from a POSIX timestamp. | |
| Return a new Timestamp representing UTC day and time. | |
| Return self.tzinfo.utcoffset(self). | |
| Return UTC time tuple, compatible with time.localtime(). | |
| Return the day of the week represented by the date. | 
A collection of timestamps may be stored in a arrays.DatetimeArray.
For timezone-aware data, the .dtype of a arrays.DatetimeArray is a
DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are timezone-aware, then every value in the array must have the same timezone.
| 
 | Pandas ExtensionArray for tz-naive or tz-aware datetime data. | 
| 
 | An ExtensionDtype for timezone-aware datetime data. | 
Timedelta data#
NumPy can natively represent timedeltas. pandas provides Timedelta
for symmetry with Timestamp.
| 
 | Represents a duration, the difference between two dates or times. | 
Properties#
| Return a numpy timedelta64 array scalar view. | |
| Return a components namedtuple-like. | |
| Number of days. | |
| Return the timedelta in nanoseconds (ns), for internal compatibility. | |
| Number of microseconds (>= 0 and less than 1 second). | |
| Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. | |
| Number of seconds (>= 0 and less than 1 day). | |
| Array view compatibility. | 
Methods#
| 
 | Return a new Timedelta ceiled to this resolution. | 
| 
 | Return a new Timedelta floored to this resolution. | 
| Format Timedelta as ISO 8601 Duration like  | |
| 
 | Round the Timedelta to the specified resolution. | 
| Convert a pandas Timedelta object into a python  | |
| Return a numpy.timedelta64 object with 'ns' precision. | |
| Convert the Timedelta to a NumPy timedelta64. | |
| Total seconds in the duration. | 
A collection of Timedelta may be stored in a TimedeltaArray.
| 
 | Pandas ExtensionArray for timedelta data. | 
Timespan data#
pandas represents spans of times as Period objects.
Period#
| 
 | Represents a period of time. | 
Properties#
| Get day of the month that a Period falls on. | |
| Day of the week the period lies in, with Monday=0 and Sunday=6. | |
| Day of the week the period lies in, with Monday=0 and Sunday=6. | |
| Return the day of the year. | |
| Return the day of the year. | |
| Get the total number of days in the month that this period falls on. | |
| Get the total number of days of the month that the Period falls in. | |
| Get the Timestamp for the end of the period. | |
| Return a string representation of the frequency. | |
| Get the hour of the day component of the Period. | |
| Return True if the period's year is in a leap year. | |
| Get minute of the hour component of the Period. | |
| Return the month this Period falls on. | |
| Return the quarter this Period falls on. | |
| Fiscal year the Period lies in according to its starting-quarter. | |
| Get the second component of the Period. | |
| Get the Timestamp for the start of the period. | |
| Get the week of the year on the given Period. | |
| Day of the week the period lies in, with Monday=0 and Sunday=6. | |
| Get the week of the year on the given Period. | |
| Return the year this Period falls on. | 
Methods#
| Convert Period to desired frequency, at the start or end of the interval. | |
| Return the period of now's date. | |
| Returns the string representation of the  | |
| Return the Timestamp representation of the Period. | 
A collection of Period may be stored in a arrays.PeriodArray.
Every period in a arrays.PeriodArray must have the same freq.
| 
 | Pandas ExtensionArray for storing Period data. | 
| 
 | An ExtensionDtype for Period data. | 
Interval data#
Arbitrary intervals can be represented as Interval objects.
| Immutable object implementing an Interval, a bounded slice-like interval. | 
Properties#
| Whether the interval is closed on the left-side, right-side, both or neither. | |
| Check if the interval is closed on the left side. | |
| Check if the interval is closed on the right side. | |
| Indicates if an interval is empty, meaning it contains no points. | |
| Left bound for the interval. | |
| Return the length of the Interval. | |
| Return the midpoint of the Interval. | |
| Check if the interval is open on the left side. | |
| Check if the interval is open on the right side. | |
| Check whether two Interval objects overlap. | |
| Right bound for the interval. | 
A collection of intervals may be stored in an arrays.IntervalArray.
| 
 | Pandas array for interval data that are closed on the same side. | 
| 
 | An ExtensionDtype for Interval data. | 
Nullable integer#
numpy.ndarray cannot natively represent integer-data with missing values.
pandas provides this through arrays.IntegerArray.
| 
 | Array of integer (optional missing) values. | 
| An ExtensionDtype for int8 integer data. | |
| An ExtensionDtype for int16 integer data. | |
| An ExtensionDtype for int32 integer data. | |
| An ExtensionDtype for int64 integer data. | |
| An ExtensionDtype for uint8 integer data. | |
| An ExtensionDtype for uint16 integer data. | |
| An ExtensionDtype for uint32 integer data. | |
| An ExtensionDtype for uint64 integer data. | 
Categorical data#
pandas defines a custom data type for representing data that can take only a
limited, fixed set of values. The dtype of a Categorical can be described by
a CategoricalDtype.
| 
 | Type for categorical data with the categories and orderedness. | 
| An  | |
| Whether the categories have an ordered relationship. | 
Categorical data can be stored in a pandas.Categorical
| 
 | Represent a categorical variable in classic R / S-plus fashion. | 
The alternative Categorical.from_codes() constructor can be used when you
have the categories and integer codes already:
| 
 | Make a Categorical type from codes and categories or dtype. | 
The dtype information is available on the Categorical
| The  | |
| The categories of this categorical. | |
| Whether the categories have an ordered relationship. | |
| The category codes of this categorical. | 
np.asarray(categorical) works by implementing the array interface. Be aware, that this converts
the Categorical back to a NumPy array, so categories and order information is not preserved!
| 
 | The numpy array interface. | 
A Categorical can be stored in a Series or DataFrame.
To create a Series of dtype category, use cat = s.astype(dtype) or
Series(..., dtype=dtype) where dtype is either
- the string - 'category'
- an instance of - CategoricalDtype.
If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical
data. See Categorical accessor for more.
Sparse data#
Data where a single value is repeated many times (e.g. 0 or NaN) may
be stored efficiently as a arrays.SparseArray.
| 
 | An ExtensionArray for storing sparse data. | 
| 
 | Dtype for data stored in  | 
The Series.sparse accessor may be used to access sparse-specific attributes
and methods if the Series contains sparse values. See
Sparse accessor for more.
Text data#
When working with text data, where each valid element is a string or missing,
we recommend using StringDtype (with the alias "string").
| 
 | Extension array for string data. | 
| 
 | Extension array for string data in a  | 
| 
 | Extension dtype for string data. | 
The Series.str accessor is available for Series backed by a arrays.StringArray.
See String handling for more.
Boolean data with missing values#
The boolean dtype (with the alias "boolean") provides support for storing
boolean data (True, False) with missing values, which is not possible
with a bool numpy.ndarray.
| 
 | Array of boolean (True/False) data with missing values. | 
| Extension dtype for boolean data. |