Selling Laptops Smart Marketing

CS320: Data Science Programming II

Project Overview

This Python project aims to predict user interest in a promotional email campaign for a laptop sale on a retail website. By analyzing historical user interaction data with similar promotions, the project utilizes a Logistic Regression model to classify users likely to be interested in the campaign. The goal is to increase campaign effectiveness by targeting the right audience and achieving high user engagement.

Features

Data Integration: Combines user demographic and behavioral data to form a comprehensive dataset.
Custom Preprocessing: Implements a column transformer to handle categorical and numerical data.
Model Training: Utilizes a logistic regression within a pipeline that includes preprocessing steps.
Predictive Analysis: Predicts user interest based on their profiles and past interactions.

Technology Stack

Python 3: Core language used for implementation and analysis.
Pandas: Data manipulation and tabular data processing.
GeoPandas: Geographic data operations (when applicable).
Matplotlib and Rasterio: Data visualization and raster data handling.
Scikit-learn: Machine learning modeling and evaluation.
SQLite: Database interactions.

Getting Started

Prerequisites

Ensure you have Python 3 installed, along with the necessary libraries:

numpy
pandas
matplotlib
geopandas
rasterio
scikit-learn

You can install the required libraries using the following command:

pip install numpy pandas matplotlib geopandas rasterio scikit-learn

Installation

Download the File:
main.py

Usage

Run the main Python script to execute the prediction model:

python main.py

This script will load the data, perform preprocessing, train the logistic regression model, and output predictions on whether users will be interested in the promotional email.

Data Description

The dataset includes user demographics (age, badge levels), past purchase amounts, and engagement metrics (like total_visits). These features are critical for predicting user interest accurately.

Model Details

Logistic Regression: Chosen for its efficacy in binary classification tasks.
Preprocessing: Categorical variables are one-hot encoded, while numerical variables are standardized.
Train-Test Split: The data is split into training and testing sets to ensure the model's performance is validated independently.

License

Distributed under the MIT License. SeeLICENSE.txt for more information.