Project Overview
This Python project aims to predict user interest in a promotional email campaign for a laptop sale on a retail website. By analyzing historical user interaction data with similar promotions, the project utilizes a Logistic Regression model to classify users likely to be interested in the campaign. The goal is to increase campaign effectiveness by targeting the right audience and achieving high user engagement.
Features
- Data Integration: Combines user demographic and behavioral data to form a comprehensive dataset.
- Custom Preprocessing: Implements a column transformer to handle categorical and numerical data.
- Model Training: Utilizes a logistic regression within a pipeline that includes preprocessing steps.
- Predictive Analysis: Predicts user interest based on their profiles and past interactions.
Technology Stack
- Python 3: Core language used for implementation and analysis.
- Pandas: Data manipulation and tabular data processing.
- GeoPandas: Geographic data operations (when applicable).
- Matplotlib and Rasterio: Data visualization and raster data handling.
- Scikit-learn: Machine learning modeling and evaluation.
- SQLite: Database interactions.
Getting Started
Prerequisites
Ensure you have Python 3 installed, along with the necessary libraries:
- numpy
- pandas
- matplotlib
- geopandas
- rasterio
- scikit-learn
You can install the required libraries using the following command:
pip install numpy pandas matplotlib geopandas rasterio scikit-learnInstallation
- Download the File:
main.py
Usage
Run the main Python script to execute the prediction model:
python main.pyThis script will load the data, perform preprocessing, train the logistic regression model, and output predictions on whether users will be interested in the promotional email.
Data Description
The dataset includes user demographics (age, badge levels), past purchase amounts, and engagement metrics (like total_visits). These features are critical for predicting user interest accurately.
Model Details
- Logistic Regression: Chosen for its efficacy in binary classification tasks.
- Preprocessing: Categorical variables are one-hot encoded, while numerical variables are standardized.
- Train-Test Split: The data is split into training and testing sets to ensure the model's performance is validated independently.
License
Distributed under the MIT License. SeeLICENSE.txt for more information.
