Stern Program for Undergraduate Research

Closed On: 02-13-2013
Research ID: RS000056
Research Title: Estimating Unobserved Customer Attributes
Description: If we know some characteristics of our customers (age, gender, ethnicity, etc.), we can use these attributes to help personalize the marketing towards that customer. Oftentimes, though, we only have indirect knowledge of these customer-specific attributes. For example, we may know the name of a customer but not his or her age. In this project, we will use customer names to make inference about genders; in turn, we will use this indirect knowledge of customer attributes to estimate purchasing behavior.

The bulk of the work will involve taking data from the US Census and the Social Security Agency to build a model relating name to gender. This will involve a small amount of statistics and a large amount of programming. You will write a series of Python programs to automatically download data from these government websites and process this data into a usable form. You will then write a program that takes a person's name as input and returns the probability of that person being male or female as output. If time allows, you will use the inferred gender and a regression model to relate a person's name to the amount that person is expected to contribute to a political campaign, using data you download and process from the Federal Election Commission website.

There will be a few statistical challenges here, for which you will receive guidance. The bulk of your time will be spent programming in Python.
Relevant Areas of Study: Marketing
Pre-requisites: STAT-UB.0103; strong knowledge of Python programming
Start Semester: Spring 2013
Credits Per Semester: 2.0
Faculty Member: Patrick Perry (
Department Affilation: N/A
Contact: Patrick Perry (
Expired On: 03-31-2013