Chris Bergin

Chris Bergin

Senior Data Analyst - Data Scientist

College Portfolio

Factors of Football Attendance

Automatic Document Classification

World Series Twitter Analysis

About

I am a Data Scientist with over 7 years of expertise in applied analytics strategies, discovering insights, and driving business success. I have a strong background in product analytics, primarily within sales and marketing teams. I also have growing NLP experience.

I am skilled at analyzing data and storytelling, advanced statistical methods, and deploying scalable solutions. I have a proven track record researching and implementing unique and innovative solutions to business problems.

This site is a showcase of my early projects. Included are decision trees to discover factors of football event attendance, text mining document classification, and twitter sentiment analysis.

Other projects I've done include leads prioritization with logistic regression, sales forecasting and with ARIMA time series modeling, model year life cycle prediction using various time series methods, and marketing optimization using multiple regression. Download my resume below for more info!

Download Resume

Contact Me

Please feel free to email me at chrisgbergin@gmail.com. Thank you! Looking forward to connecting.

Football

Client: Georgia Southern Athletics
Date: Fall 2014

Goal:

Determine the factors that affect attendance of Georgia Southern students at their home football game vs. Appalachian State.

Dataset

There are 20,550 instances, which is the total amount of Georgia Southern University (GSU) students enrolled as of September 25, 2014 (GSU vs. Appalachian State Game). The matchup is a historic rival, which makes it an interesting game to analyze. There were originally 43 attributes that we reduced to the 18 displayed below.

Georgia Southern's Event Management System records when a student swipes their Eagle ID card to gain entry to a football game. Attendance data is collected for event statistics and NCAA reporting. This binary outcome becomes our FOOTBALL attribute that we used to classify students in the dataset.

Georgia Southern's Banner System is used to gather demographic data about students. The Banner data was joined to The Event Management System on the student’s Eagle ID number which was replaced with a generic INSTANCE number for confidentiality reasons. We received Institutional Review Board (IRB) under exempt status because the data was anonymous. This approval allowed us to research and present our findings.

Attribute	Description	Attribute	Description
INSTANCE	Student’s unique identifier	FOOTBALL	Did the student attend the game?
HOUSING INDICATOR	If the student lives on campus or not	MEAL PLAN	Which meal plan the student has if any
AGE	The student’s age	ETHNIC DESC	The student’s ethnicity
CURRENT TERM CREDIT HOURS	The number of credit hours the student is taking this term	LEVEL CODE	Type of degree that the student is pursuing (undergraduate, graduate, doctoral)
DEGREE DESC	The degree that the student is pursuing	COLLEGE DESC	The college the student is a part of
MAJOR DESC	The student’s major	EXP GRAD DATE	The student’s expected graduation term
FIRST TERM	The term the student started at school	CLASSIFICATION DESC	Student’s year in school
HOURS TOTAL	The number of credit hours the student has completed	OVERALL GPA	The student’s GPA
SOR_FRAT STATUS	If the student is a part of a sorority or fraternity	SEX	The student’s gender

Data Cleansing

We removed:

Attributes where over 90 percent of the instances had the same value (COUNTRY ORIGIN)
Attributes that provided similar data as other attributes (LEVEL CODE, DEGREE DESC, STUDENT TYPE DESC)
Attributes with large amounts of missing data (RELIGION DESC and MARITAL CODE)
Attributes with a large amount of categories (COUNTY ORIGIN)

For attributes such as SAT TOTAL and OVERALL GPA with many missing values we used mean replacement. For attibutes such as MEAL PLAN where missing values just meant the student did not opt in to an optional university plan we replaced with a value of “None”.

Model Methods

We used Microsoft Sql Server Management Studio (SSMS) as our database engine and Microsoft SQL Server Analysis Services (SSAS) for developing data mining models.

We used 70 percent of the data to create the model and the remaining 30 percent for testing the model's performance.

Microsoft Naive Bayes

Based on Rev. Thomas Bayes’ theorems, this algorithm works well with categorical data, missing values, outliers, and is a good exploratory model because it alludes to which input attributes are important in predicting the output.

The accompanying attribute profiles are an example of the result of the Bayes algorithm. They visually display how different states of the input attributes (Banner data) affect the outcome of the classification attribute (Football Attendence).

Microsoft Decision Trees

Trees are easy to understand and visualize and work well with categorical data.

This flow diagram is the main result of the Microsoft Decision Tree algorithm. Each rectangle is called a node that has a miniature bar graph that shows what percentage of each classification of Football Attendence resides inside of it. Each node relates to a set of rules or conditions that an instance has to meet to end up in that particular node.

Findings

A student’s age was the primary factor in determining if they attended the football game. Students under 21 years of age were more likely to attend the game.

The next factor was a students classification (Freshman, Sophomore, etc). Attendance decreases with progression through college. It was interesting to see that that a students age and classification were not always related and there are many non-traditionally age students at GSU.

The Third influencing factor on GSU student football attendance was the number of credit hour taken in the semester. 12+ hour students (the limit to be considered a full time student) were more likely to attend.

Action

We suggested marketing targeted towards older students, upperclassmen, and non-traditional students because they are the ones less likely to go to the games.

Recognition

My group took submitted our research paper titled Analysis of Georgia Southern University Student College Football Attendance to the Southeast Decision Sciences Institute (SEDSI) conference in Savannah, GA and won Best Paper Award in Undergraduate Student Research. It was an incredible experience to present at a conference as undergrads and Georgia Southern even honored us with an article.