Google Play Store的App数据分析
google play store
是在国外下载安卓应用程序的商店,今天的案例是对商店中关于app的信息进行分析,分析后可以辅助app市场和开发设计,其中分析的关键是掌握数据清洗的方法。推荐使用 Jupyter Notebook进行学习。
公号回复: 获取数据 ‘googleplaystore.csv’
一、读取数据并理解含义
首先我们来观察一下数据,如下图所示,第一行是列名,含有App程序名称,Category类别,Rating评分,Reviews评论数,Size程序大小,Installs安装数量等等,总共有1w条数据。
1 2
| import numpy as np import pandas as pd
|
1
| df = pd.read_csv('./googleplaystore.csv',usecols=(0,1,2,3,4,5,6))
|
|
App |
Category |
Rating |
Reviews |
Size |
Installs |
Type |
0 |
Photo Editor & Candy Camera & Grid & ScrapBook |
ART_AND_DESIGN |
4.1 |
159 |
19M |
10,000+ |
Free |
1 |
Coloring book moana |
ART_AND_DESIGN |
3.9 |
967 |
14M |
500,000+ |
Free |
2 |
U Launcher Lite – FREE Live Cool Themes, Hide ... |
ART_AND_DESIGN |
4.7 |
87510 |
8.7M |
5,000,000+ |
Free |
3 |
Sketch - Draw & Paint |
ART_AND_DESIGN |
4.5 |
215644 |
25M |
50,000,000+ |
Free |
4 |
Pixel Draw - Number Art Coloring Book |
ART_AND_DESIGN |
4.3 |
967 |
2.8M |
100,000+ |
Free |
|
Rating |
count |
9367.000000 |
mean |
4.193338 |
std |
0.537431 |
min |
1.000000 |
25% |
4.000000 |
50% |
4.300000 |
75% |
4.500000 |
max |
19.000000 |
1 2 3 4 5 6 7 8
| App 10841 Category 10841 Rating 9367 Reviews 10841 Size 10841 Installs 10841 Type 10840 dtype: int64
|
二、数据清洗
1 2 3
| pd.unique(df['App']).size
|
9660
1 2 3
|
df['Category'].value_counts(dropna=False)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| FAMILY 1972 GAME 1144 TOOLS 843 MEDICAL 463 BUSINESS 460 PRODUCTIVITY 424 PERSONALIZATION 392 COMMUNICATION 387 SPORTS 384 LIFESTYLE 382 FINANCE 366 HEALTH_AND_FITNESS 341 PHOTOGRAPHY 335 SOCIAL 295 NEWS_AND_MAGAZINES 283 SHOPPING 260 TRAVEL_AND_LOCAL 258 DATING 234 BOOKS_AND_REFERENCE 231 VIDEO_PLAYERS 175 EDUCATION 156 ENTERTAINMENT 149 MAPS_AND_NAVIGATION 137 FOOD_AND_DRINK 127 HOUSE_AND_HOME 88 LIBRARIES_AND_DEMO 85 AUTO_AND_VEHICLES 85 WEATHER 82 ART_AND_DESIGN 65 EVENTS 64 COMICS 60 PARENTING 60 BEAUTY 53 1.9 1 Name: Category, dtype: int64
|
1 2 3
|
df[df['Category'] == '1.9']
|
|
App |
Category |
Rating |
Reviews |
Size |
Installs |
Type |
10472 |
Life Made WI-Fi Touchscreen Photo Frame |
1.9 |
19.0 |
3.0M |
1,000+ |
Free |
0 |
1 2 3 4
|
df.drop(index=10472, inplace=True)
|
1 2
| df['Rating'].value_counts(dropna=False)
|
NaN 1474
4.4 1109
4.3 1076
4.5 1038
4.2 952
4.6 823
4.1 708
4.0 568
4.7 499
3.9 386
3.8 303
5.0 274
3.7 239
4.8 234
3.6 174
3.5 163
3.4 128
3.3 102
4.9 87
3.0 83
3.1 69
3.2 64
2.9 45
2.8 42
2.7 25
2.6 25
2.5 21
2.3 20
2.4 19
1.0 16
2.2 14
1.9 13
2.0 12
1.7 8
1.8 8
2.1 8
1.6 4
1.5 3
1.4 3
1.2 1
Name: Rating, dtype: int64
1 2 3
| df['Rating'].fillna(value=df['Rating'].mean(), inplace=True) df['Rating'].value_counts(dropna=False)
|
4.191757 1474
4.400000 1109
4.300000 1076
4.500000 1038
4.200000 952
4.600000 823
4.100000 708
4.000000 568
4.700000 499
3.900000 386
3.800000 303
5.000000 274
3.700000 239
4.800000 234
3.600000 174
3.500000 163
3.400000 128
3.300000 102
4.900000 87
3.000000 83
3.100000 69
3.200000 64
2.900000 45
2.800000 42
2.700000 25
2.600000 25
2.500000 21
2.300000 20
2.400000 19
1.000000 16
2.200000 14
1.900000 13
2.000000 12
1.800000 8
1.700000 8
2.100000 8
1.600000 4
1.500000 3
1.400000 3
1.200000 1
Name: Rating, dtype: int64
1 2
| df['Reviews'].value_counts(dropna=False)
|
0 596
1 272
2 214
3 175
4 137
5 108
6 97
7 90
8 74
9 65
10 64
12 60
11 52
13 49
17 48
19 41
14 41
16 35
21 35
20 35
15 31
30 30
24 30
25 30
38 29
18 27
22 26
23 25
27 25
33 24
...
127229 1
2159 1
157264 1
6826 1
21262 1
37607 1
71269 1
67071 1
24215 1
63624 1
10753 1
159455 1
72596 1
8191 1
258556 1
10672 1
454412 1
56065 1
42329 1
84114 1
71432 1
815893 1
654419 1
9562 1
580 1
2976 1
18478 1
73821 1
1740 1
354 1
Name: Reviews, Length: 6001, dtype: int64
1 2 3
|
df['Reviews'].str.isnumeric().sum()
|
10840
1 2 3
|
df[-df['Reviews'].str.isnumeric()]
|
|
Rating |
count |
10840.000000 |
mean |
4.191757 |
std |
0.478907 |
min |
1.000000 |
25% |
4.100000 |
50% |
4.200000 |
75% |
4.500000 |
max |
5.000000 |
1 2 3 4
| df['Reviews'] = df['Reviews'].astype('i8') df.describe()
|
|
Rating |
Reviews |
count |
10840.000000 |
1.084000e+04 |
mean |
4.191757 |
4.441529e+05 |
std |
0.478907 |
2.927761e+06 |
min |
1.000000 |
0.000000e+00 |
25% |
4.100000 |
3.800000e+01 |
50% |
4.200000 |
2.094000e+03 |
75% |
4.500000 |
5.477550e+04 |
max |
5.000000 |
7.815831e+07 |
1 2
| df['Size'].value_counts(dropna=False)
|
Varies with device 1695
11M 198
12M 196
14M 194
13M 191
15M 184
17M 160
19M 154
26M 149
16M 149
25M 143
20M 139
21M 138
24M 136
10M 136
18M 133
23M 117
22M 114
29M 103
27M 97
28M 95
30M 84
33M 79
3.3M 77
37M 76
35M 72
31M 70
2.9M 69
2.3M 68
2.5M 68
...
245k 1
860k 1
67k 1
942k 1
629k 1
940k 1
208k 1
787k 1
785k 1
14k 1
921k 1
116k 1
234k 1
378k 1
865k 1
226k 1
122k 1
222k 1
400k 1
191k 1
549k 1
642k 1
209k 1
778k 1
540k 1
240k 1
663k 1
220k 1
11k 1
485k 1
Name: Size, Length: 461, dtype: int64
1 2 3 4 5 6
|
df['Size'] = df['Size'].str.replace('M','e+6') df['Size'] = df['Size'].str.replace('k','e+3')
|
1 2 3 4 5 6 7 8 9
| def is_convertable(v): try: float(v) return True except ValueError: return False
df['Size'].apply(is_convertable)
|
1 2 3
| temp = df['Size'].apply(is_convertable) df['Size'][-temp].value_counts()
|
1 2 3 4 5
| df['Size'] = df['Size'].str.replace('Varies with device', '0')
temp = df['Size'].apply(is_convertable) df['Size'][-temp].value_counts()
|
1 2 3 4 5 6
|
df['Size'] = df['Size'].astype('f8').astype('i8')
df['Size'].replace(0, df['Size'].mean(), inplace=True) df.describe()
|
|
Rating |
Reviews |
Size |
Installs |
count |
10840.000000 |
1.084000e+04 |
1.084000e+04 |
1.084000e+04 |
mean |
4.191757 |
4.441529e+05 |
2.099045e+07 |
1.546434e+07 |
std |
0.478907 |
2.927761e+06 |
2.078345e+07 |
8.502936e+07 |
min |
1.000000 |
0.000000e+00 |
8.500000e+03 |
0.000000e+00 |
25% |
4.100000 |
3.800000e+01 |
5.900000e+06 |
1.000000e+03 |
50% |
4.200000 |
2.094000e+03 |
1.800000e+07 |
1.000000e+05 |
75% |
4.500000 |
5.477550e+04 |
2.600000e+07 |
5.000000e+06 |
max |
5.000000 |
7.815831e+07 |
1.000000e+08 |
1.000000e+09 |
1 2 3
|
df['Installs'].value_counts()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| 1,000,000+ 1579 10,000,000+ 1252 100,000+ 1169 10,000+ 1054 1,000+ 907 5,000,000+ 752 100+ 719 500,000+ 539 50,000+ 479 5,000+ 477 100,000,000+ 409 10+ 386 500+ 330 50,000,000+ 289 50+ 205 5+ 82 500,000,000+ 72 1+ 67 1,000,000,000+ 58 0+ 14 0 1 Name: Installs, dtype: int64
|
1 2 3 4 5 6
| df['Installs'] = df['Installs'].str.replace('+', '') df['Installs'] = df['Installs'].str.replace(',', '')
df['Installs'] = df['Installs'].astype('i8') df.describe()
|
|
Rating |
Reviews |
Size |
Installs |
count |
10840.000000 |
1.084000e+04 |
1.084000e+04 |
1.084000e+04 |
mean |
4.191757 |
4.441529e+05 |
2.099045e+07 |
1.546434e+07 |
std |
0.478907 |
2.927761e+06 |
2.078345e+07 |
8.502936e+07 |
min |
1.000000 |
0.000000e+00 |
8.500000e+03 |
0.000000e+00 |
25% |
4.100000 |
3.800000e+01 |
5.900000e+06 |
1.000000e+03 |
50% |
4.200000 |
2.094000e+03 |
1.800000e+07 |
1.000000e+05 |
75% |
4.500000 |
5.477550e+04 |
2.600000e+07 |
5.000000e+06 |
max |
5.000000 |
7.815831e+07 |
1.000000e+08 |
1.000000e+09 |
1 2
| df['Type'].value_counts(dropna=False)
|
Free 10039
Paid 800
NaN 1
Name: Type, dtype: int64
1 2
| df[df['Type'].isnull()]
|
|
App |
Category |
Rating |
Reviews |
Size |
Installs |
Type |
9148 |
Command & Conquer: Rivals |
FAMILY |
4.191757 |
0 |
18152090 |
0 |
NaN |
1 2
| df.drop(index=9148, inplace=True)
|
1 2 3
| df.drop_duplicates('App',inplace = True) df.count()
|
App 9658
Category 9658
Rating 9658
Reviews 9658
Size 9658
Installs 9658
Type 9658
dtype: int64
三、数据分析-维度分析和相关性分析
|
Rating |
Reviews |
Size |
Installs |
count |
9658.000000 |
9.658000e+03 |
9.658000e+03 |
9.658000e+03 |
mean |
4.176046 |
2.166150e+05 |
2.011053e+07 |
7.778312e+06 |
std |
0.494383 |
1.831413e+06 |
2.040865e+07 |
5.376100e+07 |
min |
1.000000 |
0.000000e+00 |
8.500000e+03 |
0.000000e+00 |
25% |
4.000000 |
2.500000e+01 |
5.300000e+06 |
1.000000e+03 |
50% |
4.200000 |
9.670000e+02 |
1.600000e+07 |
1.000000e+05 |
75% |
4.500000 |
2.940800e+04 |
2.500000e+07 |
1.000000e+06 |
max |
5.000000 |
7.815831e+07 |
1.000000e+08 |
1.000000e+09 |
1 2 3 4 5
|
df.Category.unique().size
|
1 2
| df.groupby('Category').count().sort_values('App', ascending=False)
|
|
App |
Rating |
Reviews |
Size |
Installs |
Type |
Category |
|
|
|
|
|
|
FAMILY |
1831 |
1831 |
1831 |
1831 |
1831 |
1831 |
GAME |
959 |
959 |
959 |
959 |
959 |
959 |
TOOLS |
827 |
827 |
827 |
827 |
827 |
827 |
BUSINESS |
420 |
420 |
420 |
420 |
420 |
420 |
MEDICAL |
395 |
395 |
395 |
395 |
395 |
395 |
PERSONALIZATION |
376 |
376 |
376 |
376 |
376 |
376 |
PRODUCTIVITY |
374 |
374 |
374 |
374 |
374 |
374 |
LIFESTYLE |
369 |
369 |
369 |
369 |
369 |
369 |
FINANCE |
345 |
345 |
345 |
345 |
345 |
345 |
SPORTS |
325 |
325 |
325 |
325 |
325 |
325 |
COMMUNICATION |
315 |
315 |
315 |
315 |
315 |
315 |
HEALTH_AND_FITNESS |
288 |
288 |
288 |
288 |
288 |
288 |
PHOTOGRAPHY |
281 |
281 |
281 |
281 |
281 |
281 |
NEWS_AND_MAGAZINES |
254 |
254 |
254 |
254 |
254 |
254 |
SOCIAL |
239 |
239 |
239 |
239 |
239 |
239 |
BOOKS_AND_REFERENCE |
222 |
222 |
222 |
222 |
222 |
222 |
TRAVEL_AND_LOCAL |
219 |
219 |
219 |
219 |
219 |
219 |
SHOPPING |
202 |
202 |
202 |
202 |
202 |
202 |
DATING |
171 |
171 |
171 |
171 |
171 |
171 |
VIDEO_PLAYERS |
163 |
163 |
163 |
163 |
163 |
163 |
MAPS_AND_NAVIGATION |
131 |
131 |
131 |
131 |
131 |
131 |
EDUCATION |
119 |
119 |
119 |
119 |
119 |
119 |
FOOD_AND_DRINK |
112 |
112 |
112 |
112 |
112 |
112 |
ENTERTAINMENT |
102 |
102 |
102 |
102 |
102 |
102 |
AUTO_AND_VEHICLES |
85 |
85 |
85 |
85 |
85 |
85 |
LIBRARIES_AND_DEMO |
84 |
84 |
84 |
84 |
84 |
84 |
WEATHER |
79 |
79 |
79 |
79 |
79 |
79 |
HOUSE_AND_HOME |
74 |
74 |
74 |
74 |
74 |
74 |
EVENTS |
64 |
64 |
64 |
64 |
64 |
64 |
ART_AND_DESIGN |
64 |
64 |
64 |
64 |
64 |
64 |
PARENTING |
60 |
60 |
60 |
60 |
60 |
60 |
COMICS |
56 |
56 |
56 |
56 |
56 |
56 |
BEAUTY |
53 |
53 |
53 |
53 |
53 |
53 |
1 2
| df.groupby('Category').mean().sort_values('Installs', ascending=False)
|
|
Rating |
Reviews |
Size |
Installs |
Category |
|
|
|
|
COMMUNICATION |
4.134647 |
907337.676190 |
1.289365e+07 |
3.504215e+07 |
VIDEO_PLAYERS |
4.058137 |
414015.754601 |
1.631384e+07 |
2.409143e+07 |
SOCIAL |
4.238926 |
953672.807531 |
1.643765e+07 |
2.296179e+07 |
ENTERTAINMENT |
4.135294 |
340810.294118 |
2.122137e+07 |
2.072216e+07 |
PHOTOGRAPHY |
4.159614 |
374915.551601 |
1.618811e+07 |
1.654501e+07 |
PRODUCTIVITY |
4.185022 |
148638.098930 |
1.363180e+07 |
1.548955e+07 |
GAME |
4.244643 |
648903.763295 |
3.973997e+07 |
1.447229e+07 |
TRAVEL_AND_LOCAL |
4.087380 |
122464.570776 |
2.293315e+07 |
1.321866e+07 |
TOOLS |
4.059615 |
277335.644498 |
9.870441e+06 |
9.675661e+06 |
NEWS_AND_MAGAZINES |
4.135385 |
91063.889764 |
1.365578e+07 |
9.327629e+06 |
BOOKS_AND_REFERENCE |
4.308393 |
75321.234234 |
1.376752e+07 |
7.504367e+06 |
SHOPPING |
4.225835 |
220553.118812 |
1.593927e+07 |
6.932420e+06 |
WEATHER |
4.238510 |
155634.987342 |
1.427317e+07 |
4.570893e+06 |
PERSONALIZATION |
4.303077 |
142401.808511 |
1.168523e+07 |
4.075784e+06 |
HEALTH_AND_FITNESS |
4.235199 |
74171.371528 |
2.018017e+07 |
3.972300e+06 |
MAPS_AND_NAVIGATION |
4.051854 |
135337.007634 |
1.669496e+07 |
3.841846e+06 |
SPORTS |
4.211275 |
108765.578462 |
2.333144e+07 |
3.373768e+06 |
EDUCATION |
4.362956 |
112303.764706 |
1.882895e+07 |
2.965983e+06 |
FAMILY |
4.181137 |
78550.239214 |
2.666982e+07 |
2.418319e+06 |
FOOD_AND_DRINK |
4.175461 |
56473.464286 |
1.999241e+07 |
1.891060e+06 |
ART_AND_DESIGN |
4.349614 |
22175.046875 |
1.255163e+07 |
1.786533e+06 |
BUSINESS |
4.133347 |
23548.202381 |
1.431609e+07 |
1.659916e+06 |
LIFESTYLE |
4.111489 |
32066.859079 |
1.515860e+07 |
1.365375e+06 |
FINANCE |
4.125060 |
36701.756522 |
1.747266e+07 |
1.319851e+06 |
HOUSE_AND_HOME |
4.156771 |
26079.013514 |
1.632407e+07 |
1.313682e+06 |
DATING |
4.018100 |
21190.315789 |
1.583592e+07 |
8.241293e+05 |
COMICS |
4.181848 |
41822.696429 |
1.433960e+07 |
8.032348e+05 |
LIBRARIES_AND_DEMO |
4.181371 |
10795.607143 |
1.087250e+07 |
6.309037e+05 |
AUTO_AND_VEHICLES |
4.190601 |
13690.188235 |
1.981538e+07 |
6.250613e+05 |
PARENTING |
4.281960 |
15972.183333 |
2.207688e+07 |
5.253518e+05 |
BEAUTY |
4.260553 |
7476.226415 |
1.428892e+07 |
5.131519e+05 |
EVENTS |
4.363178 |
2515.906250 |
1.442185e+07 |
2.495806e+05 |
MEDICAL |
4.173252 |
2994.863291 |
1.911849e+07 |
9.669159e+04 |
1 2
| df.groupby('Category').mean().sort_values('Reviews', ascending=False)
|
|
Rating |
Reviews |
Size |
Installs |
Category |
|
|
|
|
SOCIAL |
4.238926 |
953672.807531 |
1.643765e+07 |
2.296179e+07 |
COMMUNICATION |
4.134647 |
907337.676190 |
1.289365e+07 |
3.504215e+07 |
GAME |
4.244643 |
648903.763295 |
3.973997e+07 |
1.447229e+07 |
VIDEO_PLAYERS |
4.058137 |
414015.754601 |
1.631384e+07 |
2.409143e+07 |
PHOTOGRAPHY |
4.159614 |
374915.551601 |
1.618811e+07 |
1.654501e+07 |
ENTERTAINMENT |
4.135294 |
340810.294118 |
2.122137e+07 |
2.072216e+07 |
TOOLS |
4.059615 |
277335.644498 |
9.870441e+06 |
9.675661e+06 |
SHOPPING |
4.225835 |
220553.118812 |
1.593927e+07 |
6.932420e+06 |
WEATHER |
4.238510 |
155634.987342 |
1.427317e+07 |
4.570893e+06 |
PRODUCTIVITY |
4.185022 |
148638.098930 |
1.363180e+07 |
1.548955e+07 |
PERSONALIZATION |
4.303077 |
142401.808511 |
1.168523e+07 |
4.075784e+06 |
MAPS_AND_NAVIGATION |
4.051854 |
135337.007634 |
1.669496e+07 |
3.841846e+06 |
TRAVEL_AND_LOCAL |
4.087380 |
122464.570776 |
2.293315e+07 |
1.321866e+07 |
EDUCATION |
4.362956 |
112303.764706 |
1.882895e+07 |
2.965983e+06 |
SPORTS |
4.211275 |
108765.578462 |
2.333144e+07 |
3.373768e+06 |
NEWS_AND_MAGAZINES |
4.135385 |
91063.889764 |
1.365578e+07 |
9.327629e+06 |
FAMILY |
4.181137 |
78550.239214 |
2.666982e+07 |
2.418319e+06 |
BOOKS_AND_REFERENCE |
4.308393 |
75321.234234 |
1.376752e+07 |
7.504367e+06 |
HEALTH_AND_FITNESS |
4.235199 |
74171.371528 |
2.018017e+07 |
3.972300e+06 |
FOOD_AND_DRINK |
4.175461 |
56473.464286 |
1.999241e+07 |
1.891060e+06 |
COMICS |
4.181848 |
41822.696429 |
1.433960e+07 |
8.032348e+05 |
FINANCE |
4.125060 |
36701.756522 |
1.747266e+07 |
1.319851e+06 |
LIFESTYLE |
4.111489 |
32066.859079 |
1.515860e+07 |
1.365375e+06 |
HOUSE_AND_HOME |
4.156771 |
26079.013514 |
1.632407e+07 |
1.313682e+06 |
BUSINESS |
4.133347 |
23548.202381 |
1.431609e+07 |
1.659916e+06 |
ART_AND_DESIGN |
4.349614 |
22175.046875 |
1.255163e+07 |
1.786533e+06 |
DATING |
4.018100 |
21190.315789 |
1.583592e+07 |
8.241293e+05 |
PARENTING |
4.281960 |
15972.183333 |
2.207688e+07 |
5.253518e+05 |
AUTO_AND_VEHICLES |
4.190601 |
13690.188235 |
1.981538e+07 |
6.250613e+05 |
LIBRARIES_AND_DEMO |
4.181371 |
10795.607143 |
1.087250e+07 |
6.309037e+05 |
BEAUTY |
4.260553 |
7476.226415 |
1.428892e+07 |
5.131519e+05 |
MEDICAL |
4.173252 |
2994.863291 |
1.911849e+07 |
9.669159e+04 |
EVENTS |
4.363178 |
2515.906250 |
1.442185e+07 |
2.495806e+05 |
1 2
| df.groupby('Category').mean().sort_values('Rating', ascending=False)
|
|
|
Rating |
Reviews |
Size |
Installs |
Type |
Category |
|
|
|
|
Free |
COMMUNICATION |
4.139080 |
992108.173611 |
1.350167e+07 |
3.832263e+07 |
SOCIAL |
4.243693 |
965794.741525 |
1.656355e+07 |
2.325365e+07 |
GAME |
4.233936 |
707783.190422 |
4.036479e+07 |
1.580151e+07 |
VIDEO_PLAYERS |
4.057084 |
424347.176101 |
1.636918e+07 |
2.469705e+07 |
PHOTOGRAPHY |
4.167498 |
401664.270992 |
1.667036e+07 |
1.773767e+07 |
ENTERTAINMENT |
4.126000 |
347526.410000 |
2.093427e+07 |
2.113460e+07 |
TOOLS |
4.047697 |
305987.504673 |
1.033869e+07 |
1.068097e+07 |
SHOPPING |
4.223093 |
222756.230000 |
1.606466e+07 |
7.001693e+06 |
PERSONALIZATION |
4.277251 |
180508.227119 |
1.024622e+07 |
5.183851e+06 |
WEATHER |
4.226064 |
171249.619718 |
1.429121e+07 |
5.074486e+06 |
PRODUCTIVITY |
4.183759 |
160170.312139 |
1.411873e+07 |
1.673896e+07 |
MAPS_AND_NAVIGATION |
4.059467 |
140650.476190 |
1.652609e+07 |
3.993340e+06 |
TRAVEL_AND_LOCAL |
4.084875 |
129476.657005 |
2.206258e+07 |
1.398408e+07 |
SPORTS |
4.208242 |
116937.468439 |
2.361516e+07 |
3.638640e+06 |
EDUCATION |
4.349494 |
115908.721739 |
1.813604e+07 |
3.063913e+06 |
NEWS_AND_MAGAZINES |
4.130111 |
91785.821429 |
1.364591e+07 |
9.401636e+06 |
BOOKS_AND_REFERENCE |
4.321794 |
86183.082474 |
1.393813e+07 |
8.587352e+06 |
FAMILY |
4.171360 |
85068.516990 |
2.667952e+07 |
2.674327e+06 |
HEALTH_AND_FITNESS |
4.229562 |
78078.981685 |
2.011587e+07 |
4.188822e+06 |
FOOD_AND_DRINK |
4.172288 |
57469.372727 |
2.016998e+07 |
1.924898e+06 |
COMICS |
4.181848 |
41822.696429 |
1.433960e+07 |
8.032348e+05 |
FINANCE |
4.135910 |
38533.256098 |
1.791964e+07 |
1.387692e+06 |
LIFESTYLE |
4.104136 |
33672.140000 |
1.521014e+07 |
1.436127e+06 |
HOUSE_AND_HOME |
4.156771 |
26079.013514 |
1.632407e+07 |
1.313682e+06 |
BUSINESS |
4.134144 |
24179.198529 |
1.438947e+07 |
1.708216e+06 |
ART_AND_DESIGN |
4.330742 |
23230.114754 |
1.291318e+07 |
1.874133e+06 |
DATING |
4.025574 |
21951.127273 |
1.603256e+07 |
8.540288e+05 |
Paid |
FAMILY |
4.269186 |
19850.120219 |
2.658246e+07 |
1.128405e+05 |
GAME |
4.359153 |
19181.109756 |
3.305742e+07 |
2.560971e+05 |
WEATHER |
4.348970 |
17055.125000 |
1.411302e+07 |
1.015000e+05 |
... |
... |
... |
... |
... |
EDUCATION |
4.750000 |
8661.250000 |
3.875000e+07 |
1.505000e+05 |
Free |
BEAUTY |
4.260553 |
7476.226415 |
1.428892e+07 |
5.131519e+05 |
Paid |
SPORTS |
4.249313 |
6276.458333 |
1.977301e+07 |
5.182562e+04 |
PRODUCTIVITY |
4.200628 |
6132.892857 |
7.614763e+06 |
5.043054e+04 |
PHOTOGRAPHY |
4.050896 |
6064.789474 |
9.538167e+06 |
9.888105e+04 |
ENTERTAINMENT |
4.600000 |
5004.500000 |
3.557604e+07 |
1.000000e+05 |
PARENTING |
3.350000 |
4183.000000 |
1.322604e+07 |
2.505000e+04 |
Free |
MEDICAL |
4.159613 |
3727.451923 |
1.949135e+07 |
1.206165e+05 |
Paid |
PERSONALIZATION |
4.397137 |
3619.172840 |
1.692605e+07 |
4.023202e+04 |
VIDEO_PLAYERS |
4.100000 |
3341.750000 |
1.411407e+07 |
1.775000e+04 |
COMMUNICATION |
4.087362 |
3119.037037 |
6.408087e+06 |
5.037222e+04 |
HEALTH_AND_FITNESS |
4.337802 |
3052.866667 |
2.135042e+07 |
3.160733e+04 |
Free |
EVENTS |
4.365899 |
2555.841270 |
1.454442e+07 |
2.535422e+05 |
Paid |
LIFESTYLE |
4.246935 |
2495.894737 |
1.420927e+07 |
6.205842e+04 |
TOOLS |
4.174056 |
2204.320513 |
5.374062e+06 |
2.214668e+04 |
BUSINESS |
4.106273 |
2094.333333 |
1.182101e+07 |
1.773125e+04 |
FOOD_AND_DRINK |
4.350000 |
1698.500000 |
1.022604e+07 |
3.000000e+04 |
TRAVEL_AND_LOCAL |
4.130586 |
1506.083333 |
3.795035e+07 |
1.525500e+04 |
MAPS_AND_NAVIGATION |
3.860000 |
1437.600000 |
2.095042e+07 |
2.422000e+04 |
AUTO_AND_VEHICLES |
4.327838 |
1387.666667 |
1.705070e+07 |
1.671667e+04 |
FINANCE |
3.915708 |
1364.588235 |
8.848529e+06 |
1.091776e+04 |
ART_AND_DESIGN |
4.733333 |
722.000000 |
5.200000e+06 |
5.333333e+03 |
DATING |
3.812545 |
268.000000 |
1.042835e+07 |
1.891667e+03 |
SHOPPING |
4.500000 |
242.000000 |
3.400000e+06 |
5.050000e+03 |
MEDICAL |
4.224520 |
241.036145 |
1.771691e+07 |
6.757024e+03 |
NEWS_AND_MAGAZINES |
4.800000 |
100.500000 |
1.490000e+07 |
2.750000e+03 |
SOCIAL |
3.863919 |
80.666667 |
6.533333e+06 |
2.000000e+03 |
BOOKS_AND_REFERENCE |
4.215541 |
64.142857 |
1.258550e+07 |
8.327143e+02 |
LIBRARIES_AND_DEMO |
4.191757 |
4.000000 |
4.700000e+06 |
1.000000e+02 |
EVENTS |
4.191757 |
0.000000 |
6.700000e+06 |
1.000000e+00 |
63 rows × 4 columns
1 2 3 4
|
g = df.groupby(['Type', 'Category']).mean() (g['Reviews'] / g['Installs']).sort_values(ascending=False)
|
Type Category
Paid VIDEO_PLAYERS 0.188268
FAMILY 0.175913
WEATHER 0.168031
PARENTING 0.166986
DATING 0.141674
ART_AND_DESIGN 0.135375
FINANCE 0.124988
PRODUCTIVITY 0.121611
SPORTS 0.121107
BUSINESS 0.118115
TOOLS 0.099533
TRAVEL_AND_LOCAL 0.098727
HEALTH_AND_FITNESS 0.096587
PERSONALIZATION 0.089958
AUTO_AND_VEHICLES 0.083011
BOOKS_AND_REFERENCE 0.077029
GAME 0.074898
COMMUNICATION 0.061920
PHOTOGRAPHY 0.061334
MAPS_AND_NAVIGATION 0.059356
EDUCATION 0.057550
FOOD_AND_DRINK 0.056617
Free COMICS 0.052068
Paid ENTERTAINMENT 0.050045
SHOPPING 0.047921
Free GAME 0.044792
SOCIAL 0.041533
Paid SOCIAL 0.040333
LIFESTYLE 0.040218
LIBRARIES_AND_DEMO 0.040000
...
Free MAPS_AND_NAVIGATION 0.035221
PERSONALIZATION 0.034821
WEATHER 0.033747
SPORTS 0.032138
SHOPPING 0.031815
FAMILY 0.031809
MEDICAL 0.030903
PARENTING 0.030185
FOOD_AND_DRINK 0.029856
TOOLS 0.028648
FINANCE 0.027768
COMMUNICATION 0.025888
DATING 0.025703
LIFESTYLE 0.023446
PHOTOGRAPHY 0.022645
AUTO_AND_VEHICLES 0.021844
HOUSE_AND_HOME 0.019852
HEALTH_AND_FITNESS 0.018640
VIDEO_PLAYERS 0.017182
LIBRARIES_AND_DEMO 0.017111
ENTERTAINMENT 0.016443
BEAUTY 0.014569
BUSINESS 0.014155
ART_AND_DESIGN 0.012395
EVENTS 0.010081
BOOKS_AND_REFERENCE 0.010036
NEWS_AND_MAGAZINES 0.009763
PRODUCTIVITY 0.009569
TRAVEL_AND_LOCAL 0.009259
Paid EVENTS 0.000000
Length: 63, dtype: float64
|
Rating |
Reviews |
Size |
Installs |
Rating |
1.000000 |
0.054337 |
0.052751 |
0.039245 |
Reviews |
0.054337 |
1.000000 |
0.080578 |
0.625164 |
Size |
0.052751 |
0.080578 |
1.000000 |
0.050675 |
Installs |
0.039245 |
0.625164 |
0.050675 |
1.000000 |
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 SQL社区!