Multiple Regression in Co2 Emission

  Multiple Regression in Co2 Emission

Introduction

In gapminder data, multiple regression is analysed with Co2 Emissions as a response variable. .

Getting and Preparing Data


Data Analysis


Electric consumption is linearly related to Co2 Emissions because the p-value is lower than 0.05 so we reject the null hypthothesis that there is definetly an association between electric consumption and Co2 Emission.

When we calculate electric consumption to polynomial functioni the p-value is increased to 0.08 which we can accept the null hypthothesis, there is no correlation to Co2 Emissions and electric consumption with factor 2.

R-Squared value is 7% which is pretty low indicating that we need more explanatory variables to make our model better fit.


Adding More Explantory Variables


Oil consumption and urban rate has been aded to model, but their p-values are higher than 0,05 so there is no correlation with these values to CO2 emission. Also, R-squared values decreased from 7% to 2% after adding two non-correlated values. 


QQ Plot


The residuals deviate on the red line indicating that they are not perfectly normally distributed. So this model definetly need more explanatory variables to improve the estimation.


Codes

import pandas import numpy import seaborn import scipy import matplotlib.pyplot as plt import statsmodels.api as sm import statsmodels.formula.api as smf data = pandas.read_csv('data/gapminder.csv', low_memory=False) #setting variables you will be working with to numeric data['oilperperson'] = pandas.to_numeric(data['oilperperson'], errors='coerce') data['co2emissions'] = pandas.to_numeric(data['co2emissions'], errors='coerce') data['relectricperperson'] = pandas.to_numeric(data['relectricperperson'], errors='coerce') data["urbanrate"] = pandas.to_numeric(data['urbanrate'], errors='coerce') data['oilperperson']=data['oilperperson'].replace(' ', numpy.nan) data['co2emissions']=data['co2emissions'].replace(' ', numpy.nan) data['relectricperperson']=data['relectricperperson'].replace(' ', numpy.nan) data['urbanrate']=data['urbanrate'].replace(' ', numpy.nan) # listwise deletion of missing values data = data[['urbanrate', 'oilperperson', 'relectricperperson','co2emissions']].dropna() plt1 = plt # first order (linear) scatterplot scat1 = seaborn.regplot(x="relectricperperson", y="co2emissions", scatter=True, data=data) plt.xlabel('Electricity Consumption') plt.ylabel('CO2 Emission') # fit second order polynomial # run the 2 scatterplots together to get both linear and second order fit lines scat1 = seaborn.regplot(x="relectricperperson", y="co2emissions", scatter=True, order=2, data=data) plt.xlabel('Electricity Consumption') plt.ylabel('CO2 Emission') # center quantitative IVs for regression analysis data['oilperperson_c'] = (data['oilperperson'] - data['oilperperson'].mean()) data['relectricperperson_c'] = (data['relectricperperson'] - data['relectricperperson'].mean()) data["urbanrate_c"] = (data['urbanrate'] - data['urbanrate'].mean()) # linear regression analysis reg1 = smf.ols('co2emissions ~ relectricperperson + oilperperson + urbanrate', data=data).fit() print (reg1.summary()) # quadratic (polynomial) regression analysis # run following line of code if you get PatsyError 'ImaginaryUnit' object is not callable reg2 = smf.ols('co2emissions ~ relectricperperson + I(relectricperperson**2)', data=data).fit() print (reg2.summary()) #Q-Q plot for normality fig4=sm.qqplot(reg1.resid, line='r') '''


Comments

Popular posts from this blog

Logistic Regresion in Nicotine Dependence and Alcohol Dependence

Lasso Regression in Income per Person

CO2 Emissions Corelations