Which titles matter for predicting Survivability for titanic datase

datasciencedojo · October 16, 2017, 11:27pm

The code snippet displaying titles which matter for predicting Survivability for the titanic dataset.

Reading titanic dataset and converting Survived variable into factor:

1.First, read the titanic file. Set the working directory properly.
2.Convert a Survived variable into factor column.

titanic <- read.csv(
              file = "titanic.csv",
              stringsAsFactors = FALSE
              )     #(1)

titanic$Survived <- as.factor(titanic$Survived)      #(2)

Discretizing categories:

Converting all Pclass values into variables:

titanic$pclass_one <- 0
titanic$pclass_two <- 0
titanic$pclass_three <- 0
titanic[titanic$Pclass==1,"pclass_one"] <- 1
titanic[titanic$Pclass==2,"pclass_two"] <- 1
titanic[titanic$Pclass==3,"pclass_three"] <- 1

Converting all the values of Embarked variables into variables:

titanic$embarked_q <- 0
titanic$embarked_s <- 0
titanic$embarked_c <- 0
titanic[titanic$Embarked=="Q","embarked_q"] <- 1
titanic[titanic$Embarked=="S","embarked_s"] <- 1
titanic[titanic$Embarked=="C","embarked_c"] <- 1

Converting values of gender into variables:

titanic$sex_m <- 0
titanic$sex_f <- 0
titanic[titanic$Sex=="male","sex_m"] <- 1
titanic[titanic$Sex=="female","sex_f"] <- 1

Filling Missing values of Age:

titanic[is.na(titanic$Age),"Age"] <- 28

Building Random Forest:

1.Install randomForest package if you have not installed it.
2.Load library packages.
3.Create the list of features for random forest.
4.Create random forest model .
5.Use varImpPlot to create scatter plot for variable importance calculated by random forest.

install.packages("randomForest").    #(1)
library(randomForest)                     #(2)
features <- c("Survived","Age", "SibSp", "Parch", "Fare", "pclass_one","pclass_two","pclass_three","embarked_q","embarked_s","embarked_c","sex_m","sex_f")       #(3)
titanic.forest <- randomForest(Survived~., data = titanic[,features], importance=TRUE)    #(4)
varImpPlot(titanic.forest)       #(5)

varImportance

toobamukhtar · April 15, 2019, 6:24pm

toobamukhtar · April 16, 2019, 12:34pm

One way to analyze how much impact a feature could make on target feature (Survived) is to use feature importances property from machine learning algorithm library.