概率相关定理#
import pandas as pd
import numpy as np
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
df = df.dropna()
df.head()
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | MALE |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | FEMALE |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | FEMALE |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | FEMALE |
5 | Adelie | Torgersen | 39.3 | 20.6 | 190.0 | 3650.0 | MALE |
df.shape
(333, 7)
set(df.species), set(df.island), set(df.sex)
({'Adelie', 'Chinstrap', 'Gentoo'},
{'Biscoe', 'Dream', 'Torgersen'},
{'FEMALE', 'MALE'})
## Fraction of Adelie species
adelie = (df['species'] == 'Adelie')
adelie.head()
0 True
1 True
2 True
4 True
5 True
Name: species, dtype: bool
adelie.sum()
146
# This is the fraction of True values in the Series
# Therefore, it is the fractionn of adelie species penguins
adelie.mean()
0.43843843843843844
def prob(A):
'''probability of A'''
'''Input: a series of True and False values'''
return A.mean()
prob(adelie)
0.43843843843843844
Conjunction#
A & B
## Adelie and Female
female = (df['sex'] == 'FEMALE')
female.head()
0 False
1 True
2 True
4 True
5 False
Name: sex, dtype: bool
prob(female)
0.4954954954954955
prob(adelie & female)
0.21921921921921922
prob(female & adelie)
0.21921921921921922
# The two results are different because being adelie and being female are not independent
prob(adelie) * prob(female)
0.21724427129832535
Conditional probability#
What is the probability that a penguins is female, given that it is of the species of Adelie?
female[adelie].head()
0 False
1 True
2 True
4 True
5 False
Name: sex, dtype: bool
adelie[female].head()
1 True
2 True
4 True
6 True
12 True
Name: species, dtype: bool
# number of Adelie
adelie_df = df[df.species == 'Adelie']
adelie_df.shape
(146, 7)
## female in adelie
female_adelie = adelie_df[adelie_df.sex == 'FEMALE']
female_adelie.shape
(73, 7)
female_adelie.shape[0] / adelie_df.shape[0]
0.5
prob(female[adelie])
0.5
prob(adelie[female])
0.44242424242424244
So we can know that probability of A, given B can be computated as prob(A[B])
Some laws#
\[P(A|B) = \frac{P(A \& B)}{P(B)}\]
prob(female[adelie])
0.5
prob(adelie)
0.43843843843843844
prob(female & adelie)
0.21921921921921922
prob(female[adelie]) == prob(female & adelie)/prob(adelie)
True
So we know that
\[P(A \& B) = P(B) P(A|B)\]
So
\[P(B \& A) = P(A) P(B|A)\]
Because
\[P(A \& B) = P(B \& A)\]
So:
\[\begin{split}\begin{aligned} P(A|B) &= \frac{P(A \& B)}{P(B)} \\ &= \frac{P(B \& A)}{P(B)} \\ &= \frac{P(A) P(B|A)}{P(B)}\end{aligned}\end{split}\]