A Statistical analysis about Age Is Just A Number
Introduction: The 2004 PGA (Professional golf association) Championship was the 86^{th} PGA Championship, played August 12–15 at the Straits Course of the Whistling Straits complex in Haven, Wisconsin. We have done a brief analysis on the data collected during the championship matches of each player.
Few terminologies required for this study is defined below:
Drive:
Drive is also known as teeshot or a longdistance shot played from the tee box.
Putts: Goal stroke made on a putting green to cause the ball to roll into or near the hole.
Greens in Regulation (%): The green or putting green, is the culmination of a golf hole, where the flag stick and hole are located. The percentage of time a player was able to hit the green in regulation. A green is considered hit in regulation if any portion of the ball is touching the putting surface after the GIR (Green In Regulation) state has been taken.
Objective
 Understand how money earnings are significantly affected by other factors and devise a Model by which we can predict the Total Earnings of a player if we know the other factors.
 Analyse the performance all the players and see whether age is affecting performance of player or not.
Methodology:
(For objective 1)
 At first scatter plots created keeping Total earnings in Yaxis and other factors in Xaxis. This will help us to visualize which factor is affecting the Money earnings of a player.

Based on the visualization we construct our independent variables in the Multiple Regression model.
(This is also done based on the Partial correlation coefficient between earnings and a variable neglecting the effect of the other variables. But both yields the same result.)
Where
is the Regression Coefficient Vector.
is the vector of the different factors.
e is the error.
Minimizing the error using Least square methods we fit a Multiple regression model.
(For objective 2)
 Scatter plots are created keeping Age in Xaxis and other factors in Yaxis. This will help us to visualize how factors are affected by the age of a player.
OBJECTIVE 1
Analysis: (Please see the data file attached below)
Now, for the sake of simplicity and decrease the number of variables in the study we consider the Average Winnings instead of Total Winnings. As two variables (total winnings and Number of events) are considered within a variable. Below are the scatter plots of Other Variables vs Average Winnings.
From The above diagrams we can see that all the variables except the Age is affecting the Average earnings of a player. Also, the Money rank is a categorical type of data (though it has a negative association with the average winnings) will not affect the money earnings of a player.
So, we are excluding the variables Money rank(𝑥_{6}) and Age of a players(𝑥_{7}) from the regression model.
So, the final regression Model becomes:
Where,
Y = Average winnings (dependent variable)
𝑥_{1}= Average Drives
𝑥_{2}= Driving Accuracy (%)
𝑥_{3}= Greens in Regulation
𝑥_{4}= Average number of putts
𝑥_{5}= Save percent (%)
The variable for inclusion in the regression model can also be identified by calculating the partial correlation coefficients of each and every variable with the average winnings. The concept of partial correlation can be said in a nutshell as; We only calculate the effect between 2 variables neglecting the effects of the other variables. Now we only study the partial correlation coefficients between average winnings (y) and other variables (x_{1}, x_{2}, x_{3}, x_{4}, x_{5}) and decide the inclusion of the variables depending upon the magnitude of partial correlation coefficient. Below is the partial correlation coefficient of Y with age x_{7} keeping the all other factors constant (x_{1}, x_{2}, x_{3}, x_{4}, x_{5}).
r= 0.032361
Which is very small compared to influence of other partial coefficients on Average winnings. Below are the partial correlation coefficients with avg winnings and other variables:
Avg Drives (Yards) 
Driving Accuracy (%) 
Greens On Regulation 
Average No. Of Putts 
Save% 
Money Rank 
Age 

Avg Winnings 
0.113219 
0.18619 
0.135177 
0.06903 
0.105308 
0.50523 
0.032361 
So, we exclude the age variable (x_{7}) in our study.
Preparation of Model:
Y= 916899.2
197.6 x_{1} 2770.7x_{2} + 8918.5 x_{3}
727285.7 x_{4 }+ 1600.9 x_{5
}
Using this model if we get the information about the variables (X_{1}, X_{2}, X_{3}, X_{4}, X_{5}), Then we will be able to predict the player’s Average Winnings(Y).
OBJECTIVE 2: “Age Is Just A Number.“
From the above scatterplots also we can see that Age cannot effect the other variables that much.
Facing problems in statistics, Actuarial Science or Data science? Don't worry.
Contact us to get High quality soltions.