Home









                             pandas




DataFrame






















































                               ‐2‐


     DataFrameGroupBy.size(): return a Series

         >>> lst = [’a’, ’a’, ’b’]
         >>> ser = pd.Series([1, 2, 3], index=lst)
         >>> ser
         a     1
         a     2
         b     3
         dtype: int64
         >>> ser.groupby(level=0).size()
         a    2
         b    1
         dtype: int64

     Series.reset_index(name=’’): return a DataFrame
     The name to use for the column containing the original Series values (here the sizes).
     reset_index create a new index column from the index of Series.

     >>> lst = [’a’, ’a’, ’b’]
     >>> ser = pd.Series([1, 2, 3], index=lst)
     >>> ser.groupby(level=0).size()
     a    2
     b    1
     dtype: int64
     >>> a=ser.groupby(level=0).size()
     >>> a.reset_index(name="abc")
       index  abc
     0     a    2
     1     b    1

     >>> g.size()
     category  order_id
     alcohol   1325        1
               2985        1
               3209        1
               3819        1
               5466        5
                          ..
     snacks    3420505     1
               3420788     3
               3420812     4
               3421058     1
               3421063     1
     Length: 284264, dtype: int64


     If g is the groupby from a list of columns, then the resulting g.size() uses a MultiIndex.

     >>> b=g.size()



     >>> b.index
     MultiIndex([(’alcohol’,    1325),









                               ‐3‐


                 (’alcohol’,    2985),
                 (’alcohol’,    3209),
                 (’alcohol’,    3819),
                 (’alcohol’,    5466),
                 (’alcohol’,    5535),
                 (’alcohol’,    9017),
                 (’alcohol’,    9238),
                 (’alcohol’,   13863),
                 (’alcohol’,   20253),
                 ...
                 ( ’snacks’, 3419531),
                 ( ’snacks’, 3419613),
                 ( ’snacks’, 3420090),
                 ( ’snacks’, 3420257),

     >>> b.reset_index(name=’items_count’)
            category  order_id  items_count
     0       alcohol      1325            1
     1       alcohol      2985            1
     2       alcohol      3209            1
     3       alcohol      3819            1
     4       alcohol      5466            5
     284259   snacks   3420505            1