A Statistical analysis about Age Is Just A Number

A Statistical analysis about Age Is Just A Number

Introduction: The 2004 PGA (Professional golf association) Championship was the 86th PGA Championship, played August 12–15 at the Straits Course of the Whistling Straits complex in Haven, Wisconsin. We have done a brief analysis on the data collected during the championship matches of each player.

Few terminologies required for this study is defined below:

Drive is also known as tee-shot or a long-distance shot played from the tee box.

Putts: Goal stroke made on a putting green to cause the ball to roll into or near the hole.

Greens in Regulation (%): The green or putting green, is the culmination of a golf hole, where the flag stick and hole are located. The percentage of time a player was able to hit the green in regulation. A green is considered hit in regulation if any portion of the ball is touching the putting surface after the GIR (Green In Regulation) state has been taken.


  1. Understand how money earnings are significantly affected by other factors and devise a Model by which we can predict the Total Earnings of a player if we know the other factors.
  2. Analyse the performance all the players and see whether age is affecting performance of player or not.


(For objective 1)

  • At first scatter plots created keeping Total earnings in Y-axis and other factors in X-axis. This will help us to visualize which factor is affecting the Money earnings of a player.
  • Based on the visualization we construct our independent variables in the Multiple Regression model.

    (This is also done based on the Partial correlation coefficient between earnings and a variable neglecting the effect of the other variables. But both yields the same result.)


    is the Regression Coefficient Vector.

    is the vector of the different factors.

    e is the error.

    Minimizing the error using Least square methods we fit a Multiple regression model.

(For objective 2)

  • Scatter plots are created keeping Age in X-axis and other factors in Y-axis. This will help us to visualize how factors are affected by the age of a player.


Analysis: (Please see the data file attached below)

Now, for the sake of simplicity and decrease the number of variables in the study we consider the Average Winnings instead of Total Winnings. As two variables (total winnings and Number of events) are considered within a variable. Below are the scatter plots of Other Variables vs Average Winnings.

From The above diagrams we can see that all the variables except the Age is affecting the Average earnings of a player. Also, the Money rank is a categorical type of data (though it has a negative association with the average winnings) will not affect the money earnings of a player.

So, we are excluding the variables Money rank(π‘₯6) and Age of a players(π‘₯7) from the regression model.

So, the final regression Model becomes:


Y = Average winnings (dependent variable)

π‘₯1= Average Drives

π‘₯2= Driving Accuracy (%)

π‘₯3= Greens in Regulation

π‘₯4= Average number of putts

π‘₯5= Save percent (%)

The variable for inclusion in the regression model can also be identified by calculating the partial correlation coefficients of each and every variable with the average winnings. The concept of partial correlation can be said in a nutshell as; We only calculate the effect between 2 variables neglecting the effects of the other variables. Now we only study the partial correlation coefficients between average winnings (y) and other variables (x1, x2, x3, x4, x5) and decide the inclusion of the variables depending upon the magnitude of partial correlation coefficient. Below is the partial correlation coefficient of Y with age x7 keeping the all other factors constant (x1, x2, x3, x4, x5).

r= 0.032361

Which is very small compared to influence of other partial coefficients on Average winnings. Below are the partial correlation coefficients with avg winnings and other variables:


Avg Drives (Yards)

Driving Accuracy (%)

Greens On Regulation

Average No. Of Putts


Money Rank


Avg Winnings








So, we exclude the age variable (x7) in our study.

Preparation of Model:

Y= 916899.2
-197.6 x1 -2770.7x2 + 8918.5 x3
-727285.7 x4 + 1600.9 x5

Using this model if we get the information about the variables (X1, X2, X3, X4, X5), Then we will be able to predict the player’s Average Winnings(Y).

“Age Is Just A Number.

From the above scatterplots also we can see that Age cannot effect the other variables that much.




Mathematica-city is an online Education forum for Science students run by Kounteyo, Shreyansh and Souvik. We aim to provide articles related to Actuarial Science, Data Science, Statistics, Mathematics and their applications using different Statistical Software. Feel free to reach out to us for any kind of discussion on any of the related topics,

Leave a Reply

Your email address will not be published.