Q. Describe point-biserial
correlation and phi-coefficient.
6
Point-biserial
correlation
Some variables in
research are dichotomous. The dichotomous variable is the one that can only
take one of two sharply distinguished or mutually exclusive categories. Some
examples are, male-female, rural-urban, Indian-American, diagnosed with illness
and not diagnosed with illness, Experimental group and Control Group, etc.
These are the truly dichotomous variables for which no underlying continuous
distribution can be assumed. They represent categories rather than measurements.
Now if we want to
correlate these variables, then applying Pearson’s formula have problems
because of lack of continuity. Pearson’s correlation requires continuous
variables. When we want to assess the relationship between such a dichotomous
variable and a continuous variable (e.g., income, test scores, age),
using the regular Pearson correlation coefficient can be problematic.
This is because Pearson’s r assumes that both variables are continuous
and normally distributed, which is not the case here.
To address this, we use
the Point-Biserial Correlation Coefficient (rpb). This is a
special case of the Pearson correlation and is specifically used when one
variable is truly dichotomous and the other is continuous. Mathematically, the
formula for rₚb is equivalent to Pearson’s r, but conceptually, it acknowledges
the categorical nature of one variable.
The dichotomous variable
is typically coded as 0 and 1 (though any two distinct values can be used—e.g.,
0 and 1, 5 and 11—the correlation result remains the same).
Point Biserial
Correlation (rpb) is Pearson’s Product moment correlation
between one truly dichotomous variable and other continuous variable.
Algebraically, the rpb = r. So we can calculate rpb in a
similar way.
Phi-coefficient
The Pearson’s correlation
between one dichotomous variable and another continuous variable is called as
point-biserial correlation. When both the variables are dichotomous, then the
Pearson’s correlation calculated is called as Phi Coefficient (ϕ).
Use the Phi-coefficient
when both variables are binary (e.g., comparing gender and exam result, or
smoking status and disease status).
If we organize the data in a 2×2 table
|
|
Variable
B = 1 |
Variable
B = 0 |
Row
Total |
|
Variable
A = 1 |
a |
b |
a
+ b |
|
Variable
A = 0 |
c |
d |
c
+ d |
|
Column
Total |
a
+ c |
b
+ d |
N
(Total) |
Then the Phi-coefficient (ϕ) is calculated as: