Random Forest in Income per Person

 Random Forest in Income per Person

Introduction

In gap minder data, Income per person is analyzed with the relation of oil consumption, Co2 emission, internet user rate an so on.
Because these features are related social welfare, there might be some correlation with income per person.

Getting and Preparing Data


incomeperperson : Gross Domestic Product per capita
oilperperson: Oil Consumption per capita
co2emissions: CO2 consumtion
internetuserate : Internet users (per 100 people)
lifeexpectancy : life expectancy at birth (years)
polityscore : subtracting an autocracy score from a democracy score.
relectricperperson: residential electricity consumption per person
urbanrate : urban population
employrate : Percentage of total population, age above 15, that has been employed 



 The target data is converted to 12 category and change its data type to string for tree classification. 


Data Modelling




Target data is income per person and the rest of the columns are splited into train and test data as an explanatory variable.



Results

Accuracy is 41% that is pretty low. It needs more explanatory variables. 

Feature Importance

Internetuserate 
that is 19% is the most effect on the model  The second most effective feature is the urbanrate, it is 17%. Higher urbanrate and internetuser   are expected some correlation on our target variable incomeperperson.


Oilperperson that has 5% score, has the least effect on this model 




Comments

Popular posts from this blog

Logistic Regresion in Nicotine Dependence and Alcohol Dependence

Lasso Regression in Income per Person

CO2 Emissions Corelations