Understanding the relationship between variables

--

What is a variable? The term is synonymous with the symbolic name whose value keeps on changing. The Correlation between the variables is defined as the statistical relationship between the variables. These can range from positive to negative or even zero in some case.

The different types of relationships ranging from

• Direct Relationship

Here, the two variables have a direct relationship. If one of the variables increases the other also increases and if one of them decreases, the other decreases also. For instance, the number of hours put into a work, the effort increases, and also the appreciation increases. This is the case of a Direct Relationship.

https://images.slideplayer.com/21/6284437/slides/slide_5.jpg

• Indirect (or Inverse) Relationship

Here, one of the variables increases the other variable decreases in magnitude and quantity. Consider the case where the age affects the pay which can affect the satisfaction of a job.

https://businessterms.org/wp-content/uploads/2018/11/Inverse-Relationship-610x336.jpg

Ways to measure the relationships is Pearson’s Correlation Method which is done by taking the covariance of the two variables and measuring how far are the values from the expected value. Depending on this, the final answer can range from positive, negative, or even zero

Thus, the Correlation Coefficient between two random variables X and Y where is defined as follows

The Correlation Coefficient formula is thus symmetric as we can see from the formula, that is cor(X, Y) = cor(Y, X). Thus, this is analysed through the Commutative property

Now we have the Joint Probability Distribution between the two variables X and Y as shown in the table given below

Here, we can calculate the Marginal distribution using the formula

Thus, the Marginal distribution has resulted in 0

Spearman’s Rank Correlation gives a measure of when one variable increases and how much of the other quantity increases. This is the aim of the Correlation

For instance, consider the following pair of elements given in ordered pair of (X, Y) as shown below

(0, 1), (10, 100), (101, 500), (102, 2000)

Here, when we compare each of the pairs with the predeceasing pair, when X increases, Y also increases. Thus, it gives a direct relationship. Thus, the value of Spearman’s Rank Correlation is 1

Some other measures of finding the relation is using the Significance Test which is due to the use of the T-test as follows

• Here, the value is the n-2 degree of freedom. Some assumptions considered here are

• Both the variables are normally distributed

• There is a linear relationship among the variables

• The Null Hypothesis states that there is no association among the variables

Thus, from all of these, we need to calculate the Regression Equation that with how much of the value of Y does the value of X change and thus can be plotted on a scatter diagram that is to be a straight line in most of the case. The direction of bending of the line gives the magnitude (positive, negative, or zero). This can be measured using the given formula

Thus, we can infer the following from the above steps

If x = 110, y = (1.033 X 110) — 84.2 = 31.2

If x = 140, y = (1.033 X 140) — 84.2 = 62.2

If x = 170, y = (1.033 X 170) — 84.2 = 93.2

The same plotted as a graph would look like this

From this, we can infer that the errors can be approximated to a normal distribution and the relationship between the variables are linear, and the scatter plots depict the slope of the curve.

We come back to the point where we left off our topic. In the types of Correlation

A positive Correlation is where all the points lie on a straight line and have a positive slope. It shows that when one variable increases, the other also increases

A negative Correlation states that they have a negative slope and there is an indirect relationship between the variables

Zero or No Correlation states that the slope is zero and there is no connection between the variables

https://www.simplypsychology.org/correlation.jpg

Thus, these are some measures to determine the relationship between the variables and depict them graphically, and using the formula using various methods.

Written by: Raghav K

Connect with me on LinkedIn: https://www.linkedin.com/in/raghav-k-8b47ab1b8

--

--

Data Analytics Club, VIT Chennai

Curious about developments in Data Science? You are at the right place! Follow us at our LinkedIn page : https://www.linkedin.com/company/dac-vit-chennai/