pandas.Series.str.extractall#
- Series.str.extractall(pat, flags=0)[source]#
- Extract capture groups in the regex pat as columns in DataFrame. - For each subject string in the Series, extract groups from all matches of regular expression pat. When each subject string in the Series has exactly one match, extractall(pat).xs(0, level=’match’) is the same as extract(pat). - Parameters
- patstr
- Regular expression pattern with capturing groups. 
- flagsint, default 0 (no flags)
- A - remodule flag, for example- re.IGNORECASE. These allow to modify regular expression matching for things like case, spaces, etc. Multiple flags can be combined with the bitwise OR operator, for example- re.IGNORECASE | re.MULTILINE.
 
- Returns
- DataFrame
- A - DataFramewith one row for each match, and one column for each group. Its rows have a- MultiIndexwith first levels that come from the subject- Series. The last level is named ‘match’ and indexes the matches in each item of the- Series. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used.
 
 - See also - extract
- Returns first match only (not all matches). 
 - Examples - A pattern with one group will return a DataFrame with one column. Indices with no matches will not appear in the result. - >>> s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"]) >>> s.str.extractall(r"[ab](\d)") 0 match A 0 1 1 2 B 0 1 - Capture group names are used for column names of the result. - >>> s.str.extractall(r"[ab](?P<digit>\d)") digit match A 0 1 1 2 B 0 1 - A pattern with two groups will return a DataFrame with two columns. - >>> s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)") letter digit match A 0 a 1 1 a 2 B 0 b 1 - Optional groups that do not match are NaN in the result. - >>> s.str.extractall(r"(?P<letter>[ab])?(?P<digit>\d)") letter digit match A 0 a 1 1 a 2 B 0 b 1 C 0 NaN 1