【基本工具】S02E16 GroupBy使用方法（下篇）：累计、过滤、转换与应用

# 0.本集概览

1.GroupBy的累计、过滤、转换和应用功能
2.GroupBy分组键的自定义
3.索引名称的别名索引

import numpy as np
import pandas as pd

rng = np.random.RandomState(0)
df = pd.DataFrame({'key':['A','B','C','A','B','C'],
'data1': range(6),
'data2': rng.randint(0,10,6)})

print(df)

   data1  data2 key
0      0      5   A
1      1      0   B
2      2      3   C
3      3      3   A
4      4      7   B
5      5      9   C

# 1.累计

print(df.groupby('key').aggregate(['min', np.median, 'max']))

    data1            data2
min median max   min median max
key
A       0    1.5   3     3    4.0   5
B       1    2.5   4     0    3.5   7
C       2    3.5   5     3    6.0   9

print(df.groupby('key').aggregate({'data1':'min','data2':'max'}))

     data1  data2
key
A        0      5
B        1      7
C        2      9

# 2.过滤

print(df)
print(df.groupby('key').std())
def filter_func(x):
return x['data2'].std() > 4

print(df.groupby('key').filter(filter_func))

   data1  data2 key
0      0      5   A
1      1      0   B
2      2      3   C
3      3      3   A
4      4      7   B
5      5      9   C

data1     data2
key
A    2.12132  1.414214
B    2.12132  4.949747
C    2.12132  4.242641

data1  data2 key
1      1      0   B
2      2      3   C
4      4      7   B
5      5      9   C

filter_func函数返回一个布尔值，标准差小于4的时候返回False，则被过滤掉。

# 3.转换

print(df)
print(df.groupby('key').transform(lambda x: x - x.mean()))

`
data1 data2 key
0 0 5 A
1 1 0 B
2 2 3 C
3 3 3 A