Payment default of credit card clients: EDA and supervised ML modeling
This data science project performs an Exploratory Data Analysis (EDA) on data related to default payments of credit card clients and trains several supervised machine learning models to predict defaulting. The data corresponds to the Default of credit card clients dataset available in Kaggle.
The EDA (i) checks the distribution of some variables and performs variable transformations (log, square root and cubic root) in case of non-normal behaviors and (ii) explores the relationship between some variables and defaulting. A future update will perform feature engineering.
The modeling work trains, validates and tests a logistic regression and an Extreme Gradient Boosting (XGBoost) classifier. It begins with a basic model evaluation using train/validation/test datasets and finishes with a more systematic approach for model evaluation through hyperparamerer tuning using k-fold cross-validation. A future update will replicate the analysis with a neural network and work with a basic deployment.
Details about the implementation and the results can be found in the project’s repo.