How to build a Decision Tree using titainic dataset?

Building decision tree model using rpart and the Titanic dataset:

You can get required dataset here.
Note: Set your working directory properly.

1. Reading file into dataframe:

titanic <- read.csv(file = "titanic.csv",
            stringsAsFactors = FALSE)

2. Cleaning missing values:

a. Cleaning embarked:

titanic[titanic$Embarked=="","Embarked"] <- "S"

b. Cleaning age

Finding rows that have masters:

masters <- grep(pattern = "Master\\.",
                x = titanic$Name, 
       = TRUE)

c. Calculating median age for masters:

median.masters <- median(
                  titanic[masters, "Age"],

d. Engineering a masters column

titanic$IsMaster <- FALSE
titanic[masters, "IsMaster"] <- TRUE
is.master <- titanic$IsMaster==TRUE
age.missing <-$Age)

e. Filling in missing values of age:

titanic[is.master & age.missing, "Age"] <- median.masters

f. Cleaning remaining age values:

median.age <- median(
  titanic[!is.master, "Age"],
  na.rm = TRUE)
age.missing <-$Age)
titanic[age.missing, "Age"] <- median.age

3. Casting variables into factors::

titanic$Survived <- as.factor(titanic$Survived)
titanic$Pclass <- as.factor(titanic$Pclass)
titanic$Sex <- as.factor(titanic$Sex)
titanic$Embarked <- as.factor(titanic$Embarked)
titanic$IsMaster <- as.factor(titanic$IsMaster)

4.Splitting data into train and test data:

bag <- nrow(titanic)
train.indices <- sample(1:bag, bag * .7)
titanic.train <- titanic[train.indices,]
titanic.test <- titanic[-train.indices,]

5.Creating a list of features:

features = c("Survived", "Pclass", "Sex",
              "Age", "SibSp", "Parch", "Fare",
              "Embarked", "IsMaster")

6.Creating machine learning model with rpart:

titanic.tree <- rpart(
  formula = Survived~.,
  data = titanic.train[,features]

7. Making predictions:

predictions <- predict(
  titanic.tree, newdata = titanic.test,
  type = "class")
1 Like