概率相关定理

概率相关定理#

import pandas as pd
import numpy as np
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
df = df.dropna()
df.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
5 Adelie Torgersen 39.3 20.6 190.0 3650.0 MALE
df.shape
(333, 7)
set(df.species), set(df.island), set(df.sex)
({'Adelie', 'Chinstrap', 'Gentoo'},
 {'Biscoe', 'Dream', 'Torgersen'},
 {'FEMALE', 'MALE'})
## Fraction of Adelie species
adelie = (df['species'] == 'Adelie')
adelie.head()
0    True
1    True
2    True
4    True
5    True
Name: species, dtype: bool
adelie.sum()
146
# This is the fraction of True values in the Series
# Therefore, it is the fractionn of adelie species penguins
adelie.mean()
0.43843843843843844
def prob(A):
    '''probability of A'''
    '''Input: a series of True and False values'''
    return A.mean()
prob(adelie)
0.43843843843843844

Conjunction#

A & B

## Adelie and Female
female = (df['sex'] == 'FEMALE')
female.head()
0    False
1     True
2     True
4     True
5    False
Name: sex, dtype: bool
prob(female)
0.4954954954954955
prob(adelie & female)
0.21921921921921922
prob(female & adelie)
0.21921921921921922
# The two results are different because being adelie and being female are not independent
prob(adelie) * prob(female)
0.21724427129832535

Conditional probability#

What is the probability that a penguins is female, given that it is of the species of Adelie?

female[adelie].head()
0    False
1     True
2     True
4     True
5    False
Name: sex, dtype: bool
adelie[female].head()
1     True
2     True
4     True
6     True
12    True
Name: species, dtype: bool
# number of Adelie
adelie_df = df[df.species == 'Adelie']
adelie_df.shape
(146, 7)
## female in adelie
female_adelie = adelie_df[adelie_df.sex == 'FEMALE']
female_adelie.shape
(73, 7)
female_adelie.shape[0] / adelie_df.shape[0]
0.5
prob(female[adelie])
0.5
prob(adelie[female])
0.44242424242424244

So we can know that probability of A, given B can be computated as prob(A[B])

Some laws#

\[P(A|B) = \frac{P(A \& B)}{P(B)}\]
prob(female[adelie])
0.5
prob(adelie)
0.43843843843843844
prob(female & adelie)
0.21921921921921922
prob(female[adelie]) == prob(female & adelie)/prob(adelie)
True

So we know that

\[P(A \& B) = P(B) P(A|B)\]

So

\[P(B \& A) = P(A) P(B|A)\]

Because

\[P(A \& B) = P(B \& A)\]

So:

\[\begin{split}\begin{aligned} P(A|B) &= \frac{P(A \& B)}{P(B)} \\ &= \frac{P(B \& A)}{P(B)} \\ &= \frac{P(A) P(B|A)}{P(B)}\end{aligned}\end{split}\]