Multiple Regression in Co2 Emission
Multiple Regression in Co2 Emission
Introduction
In gapminder data, multiple regression is analysed with Co2 Emissions as a response variable. .
When we calculate electric consumption to polynomial functioni the p-value is increased to 0.08 which we can accept the null hypthothesis, there is no correlation to Co2 Emissions and electric consumption with factor 2.
R-Squared value is 7% which is pretty low indicating that we need more explanatory variables to make our model better fit.
Adding More Explantory Variables
QQ Plot
The residuals deviate on the red line indicating that they are not perfectly normally distributed. So this model definetly need more explanatory variables to improve the estimation.
Codes
import pandas
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
data = pandas.read_csv('data/gapminder.csv', low_memory=False)
#setting variables you will be working with to numeric
data['oilperperson'] = pandas.to_numeric(data['oilperperson'], errors='coerce')
data['co2emissions'] = pandas.to_numeric(data['co2emissions'], errors='coerce')
data['relectricperperson'] = pandas.to_numeric(data['relectricperperson'], errors='coerce')
data["urbanrate"] = pandas.to_numeric(data['urbanrate'], errors='coerce')
data['oilperperson']=data['oilperperson'].replace(' ', numpy.nan)
data['co2emissions']=data['co2emissions'].replace(' ', numpy.nan)
data['relectricperperson']=data['relectricperperson'].replace(' ', numpy.nan)
data['urbanrate']=data['urbanrate'].replace(' ', numpy.nan)
# listwise deletion of missing values
data = data[['urbanrate', 'oilperperson', 'relectricperperson','co2emissions']].dropna()
plt1 = plt
# first order (linear) scatterplot
scat1 = seaborn.regplot(x="relectricperperson", y="co2emissions", scatter=True, data=data)
plt.xlabel('Electricity Consumption')
plt.ylabel('CO2 Emission')
# fit second order polynomial
# run the 2 scatterplots together to get both linear and second order fit lines
scat1 = seaborn.regplot(x="relectricperperson", y="co2emissions", scatter=True, order=2, data=data)
plt.xlabel('Electricity Consumption')
plt.ylabel('CO2 Emission')
# center quantitative IVs for regression analysis
data['oilperperson_c'] = (data['oilperperson'] - data['oilperperson'].mean())
data['relectricperperson_c'] = (data['relectricperperson'] - data['relectricperperson'].mean())
data["urbanrate_c"] = (data['urbanrate'] - data['urbanrate'].mean())
# linear regression analysis
reg1 = smf.ols('co2emissions ~ relectricperperson + oilperperson + urbanrate', data=data).fit()
print (reg1.summary())
# quadratic (polynomial) regression analysis
# run following line of code if you get PatsyError 'ImaginaryUnit' object is not callable
reg2 = smf.ols('co2emissions ~ relectricperperson + I(relectricperperson**2)', data=data).fit()
print (reg2.summary())
#Q-Q plot for normality
fig4=sm.qqplot(reg1.resid, line='r')
'''




Comments
Post a Comment