 Research
 Open Access
Multivariate adaptive regression splines models for vehicular emission prediction
 Seth Daniel Oduro^{1}Email author,
 Santanu Metia^{1},
 Hiep Duc^{2},
 Guang Hong^{1} and
 Q.P. Ha^{1}
 Received: 21 January 2014
 Accepted: 23 March 2015
 Published: 10 June 2015
Abstract
Background
Rate models for predicting vehicular emissions of nitrogen oxides (NO _{ X }) are insensitive to the vehicle modes of operation, such as cruise, acceleration, deceleration and idle, because these models are usually based on the average trip speed. This study demonstrates the feasibility of using other variables such as vehicle speed, acceleration, load, power and ambient temperature to predict (NO _{ X }) emissions to ensure that the emission inventory is accurate and hence the air quality modelling and management plans are designed and implemented appropriately.
Methods
We propose to use the nonparametric BoostingMultivariate Adaptive Regression Splines (BMARS) algorithm to improve the accuracy of the Multivariate Adaptive Regression Splines (MARS) modelling to effectively predict NO _{ X } emissions of vehicles in accordance with onboard measurements and the chassis dynamometer testing. The BMARS methodology is then applied to the NO _{ X } emission estimation.
Results
The model approach provides more reliable results of the estimation and offers better predictions of NO _{ X } emissions.
Conclusion
The results therefore suggest that the BMARS methodology is a useful and fairly accurate tool for predicting NO _{ X } emissions and it may be adopted by regulatory agencies.
Keywords
 Nitrogen oxide
 Onboard emission measurement system
 Chassis dynamometer testing system
 Emission
Background
Outdoor air pollution is reported as the main reason to cause 1.3 million annual deaths worldwide (World Health Organization 2015). Among air pollutants coming from natural effects (Duc et al. 2013), manmade emissions have been the main concern in airquality modelling and control. Vehicular emissions, in this context, can bring serious impacts on air quality and thus, have received increasing research attention (Sharma et al. 2010). Road transport often appears as the single most important source of urban pollutant emissions in source apportionment studies (Maykut et al. 2007). In the coming decades, road transport is likely to remain a large contributor to air pollution, especially in urban areas.
For this reason, major efforts are being made for the reduction of polluting emissions from road transport. These include new powertrains and vehicle technology improvements, fuel refinements, optimization of urban traffic management and the implementation of tighter emission standards (Querol et al. 2007). In recent decades, many emission models have been developed. Afotel et al. 2013 proposed regression models to estimate lightduty gasoline vehicle emissions of CO _{2} based on vehicle velocity, acceleration, deceleration, power demand and time of the day. However, the model did not include NO _{ X } emissions. Oduro et al. 2013 proposed multiple regression models with instantaneous speed and acceleration as a predictor variables to estimate vehicular emissions of CO _{2} but not NO _{ X }. TóthNagy et al. 2006 proposed an artificial neural networkbased model for predicting emissions of CO and NO _{ X } from heavyduty diesel conventional and hybrid vehicles. The methodology sounds promising, but applied to heavyduty vehicles only, and the fit function contains many details which make the model difficult to understand. Emission model based on instantaneous vehicle power, which is computed on total resistance force, vehicle mass, acceleration, velocity, and driveline efficiency, was developed by Rakha et al. 2011. However, the model applies for fuel consumption and CO _{2} emission factor and does not include the NO _{ X } emission.
A key gap in our understanding of these emissions is the effect of changes in vehicle speed, power and load on average emission rates for the onroad vehicle fleet. Vehicle power, load and vehicle speed are closely linked to fuel consumption and pollutant emission rates (European Commission White Paper 2015). Improved understanding of the link between operating conditions and emissions could develop accurate models for prediction of vehicle emissions. The quality of the application of any road vehicle emission model largely depends on the representativeness of the emission factor such as carbon dioxide (CO _{2}), carbon monoxide (CO), nitrogen oxides (NO _{ X }), volatile organic compounds (VOCs) and particulate matter (PM). This refers to the accuracy with which the emission factor can describe the actual emission level of a particular vehicle type and driving conditions applied to it.
This work focuses on using the MARS methodology to improve the prediction accuracy of chassis dynamometer and onboard measurement systems. The dynamometer testing is one of the three typical vehicle tailpipe emission measurement methods, where emissions from vehicles are measured under laboratory conditions during a driving cycle to simulate vehicle road operations (Frey et al. 2003). The real world onboard emissions measurement is widely recognized as a desirable approach for quantifying emissions from vehicles since data are collected under realworld conditions at any location travelled by the vehicle (Durbin et al. 2007). Variability in vehicle emissions as a result of changes in facility (roadway) characteristics, vehicle location, vehicle operation, driver, or other factors can be represented and analysed more reliably than with the other methods (Frey et al. 2002). This is because measurements are obtained during real world driving, eliminating the concern about non representativeness that is often an issue with dynamometer testing, and at any location, eliminating the setting restrictions inherent in remote sensing. Though this measuring technique seems to be more promising, the need to improve the prediction accuracy of emission factor especially with NO _{ X } emissions by using effective statistical techniques is important in any emission inventory.
A number of the models discussed above either do not estimate NO _{ X } emissions, or are so sophisticated as to require excessive data inputs. There needs to be a balance between the accuracy and detail of a model for its ease of application. Therefore, to enhance the prediction performance for the NO _{ X } emissions, the boosting MARS (BMARS) modelling approach is proposed in this paper. Here, we aim to estimate, with high accuracy, the NO _{ X } emissions. The effectiveness of the model is then determined by grouping the data into two parts, one for building the model (learning) and the other for validating the model (testing). The results are verified by comparing the experimental data, BMARS and MARS predicted values. The remainder of this paper presents the data collection methods, namely via chassis dynamometer and onboard data collection, the MARS model and BMARS methods.
Data collection methods
Chassis dynamometer
Onboard data collection
Methods
The Multivariate Adaptive Regression Splines (MARS) were introduced for fitting the relationship between a set of predictors and dependent variables (Friedman 1991). MARS is a multivariate, piecewise regression technique that can be used to model complex relationship. The space of predictors is divided into multiple knots in order to fit a spline function between these knots (Friedman 1991; MARS User Guide 2015). The basic problem in vehicular emission modelling is how best to determine the fundamental relationship between dependent variables, and vector of predictors, such as speed, acceleration, load, power, ambient temperature including other factors.
 1.
Segregate operation parameters including speed, acceleration, power, load and ambient temperature from the raw data.
 2.Predict the emission NO _{ X } by using the approximate function \(\hat {f}(\cdot)\) with \(\{\beta ^{*}_{m}\}^{M}_{0}\) and \(\{h^{*}_{m}(\cdot)\}^{M}_{0}\), that is \(\hat {f}(x_{1i} \cdots x_{\textit {mi}})= \beta ^{*}_{0}+\sum \limits _{m=1}^{M}\beta ^{*}_{m}h^{*}_{m}(x_{1i} \cdots x_{\textit {mi}}), i=1\cdots N\), where \(\{x_{1i}\cdots x_{\textit {mi}}\}^{N}_{1}\) are from new measurements. The basis functions, together with the model parameters, are combined to produce the predictions given the inputs. The general MARS model equation is given as:$$ \hat{f}(X)= \beta_{0}+\sum\limits_{m=1}^{M}\beta_{m}h_{m}(X), $$(5)
where \(\{\beta \}^{m}_{0}\) are the coefficients of the model that are estimated to yield the best fit to the data, M is the number of subregions or the number of basis functions in the model, and h _{ m }(X) is the spline basis function given in (4).
This model searches over the space of all inputs and predictor values (referred to as “knots") as well as the interactions between variables. During this search, an increasingly larger number of basis functions are added to the model to minimize a lackoffit criterion. As a result of these operations, MARS automatically determines the most important independent variables as well as the most significant interactions among them. It is noted that the search for the best predictor and knot location is performed in an iterative process. The predictors as well as the knot location, having the most contribution to the model, are selected first. Also, at the end of each iteration, the introduction of an interaction is checked for possible model improvements.
Model selection and pruning
where \(\left [1\frac {\tilde {C}(M)}{N}\right ]^{2}\) is a complexity function, and \(\tilde {C}(M)\) is defined as \(\tilde {C}(M) = C(M)+dM\), of which C(M) is the number of parameters being fit and d represents a cost for each basis function optimization and is a smoothing parameter of the procedure. The higher the cost d is, the more basis functions will be eliminated (Put et al. 2004).
Boosting algorithm
 1.
Initialize \(F_{0}(x)=\bar {y}\).
 2.For m=1 to M do:

Compute$$ \tilde{y_{i}}=y_{i}F_{m1}(x_{i}),\quad i=1, N. $$(10)

Compute$$ (\rho_{m}, a_{m})= \arg\min_{a,\rho}\sum_{i=1}^{N}[\tilde{y_{i}}\rho h(x_{i}; a) ]^{2}. $$(11)

The Update estimator at step m becomes$$ F_{m}(x)=F_{m1}(x)+\rho_{m}h(x; a_{m}).\ $$(12)

 3.
End for
 4.
Output the final regression function F _{ m }(x).
Results and discussions
List of basis functions of the MARS and their coefficients for onboard measurements
Beta  Basis  Value 

factor  function  
BF0  0.249827  
BF1  Max(0, SPEED8.11)  –0.000142 
BF2  Max(0, SPEED11.67)  0.000342 
BF3  Max(0, SPEED12.52)  0.000442 
BF4  Max(0, SPEED16.39)  0.0032363 
BF5  Max(0, SPEED23.89)  0.011587 
BF6  Max(0, SPEED24.17)  0.043903 
BF7  Max(0.95ACCEL, 0)  –0.001307 
BF8  Max(0, ACCEL1.25)  0.007307 
BF9  Max(0, ACCEL5.85)  0.011308 
BF10  Max(0, ACCEL7.21)  0.031102 
BF11  Max(0, AMBT 22.12)  0.000231 
BF12  Max(0, AMBT 23.47)  0.003130 
BF13  Max(0, AMBT 24.76)  0.021131 
BF14  Max(0, LOAD 10.53)  0.015618 
BF15  Max(0, LOAD 52.34)  0.017966 
BF16  Max(0, LOAD 60.16)  0.023225 
BF17  Max(0, Power 8.98)  0.014877 
BF18  Max(0, Power 21.32)  0.015679 
List of basis functions of the BMARS and their coefficients for onboard measurements
Beta  Basis  Value 

factor  function  
BF0  0.218753  
BF1  Max(0, SPEED9.22)  –0.000112 
BF2  Max(0, SPEED12.54)  –0.000212 
BF3  Max(0, SPEED14.23)  0.000211 
BF4  Max(0, SPEED17.16)  0.002417 
BF5  Max(0, SPEED24.32)  0.011213 
BF6  Max(0.98ACCEL, 1.12)  –0.001126 
BF7  Max(0, ACCEL2.61)  –0.004267 
BF8  Max(0, ACCEL6.92)  0.011211 
BF9  Max(0, ACCEL7.56)  0.024123 
BF10  Max(0, AMBT 22.51)  0.000212 
BF11  Max(0, AMBT 23.78)  0.003451 
BF12  Max(0, LOAD 13.15)  0.013617 
BF13  Max(0, LOAD 54.56)  0.016541 
BF14  Max(0, POWER 7.34)  0.012346 
BF15  Max(0, POWER 9.76)  0.013145 
BF16  Max(0, POWER 22.17)  0.012678 
List of basis functions of the MARS and their coefficients for dynamometer testing
Beta  Basis  Value 

factor  function  
BF0  0.313578  
BF1  Max(0, SPEED6.43)  –0.000172 
BF2  Max(0, SPEED9.36)  0.000625 
BF3  Max(0, SPEED18.37)  0.005751 
BF4  Max(0, SPEED25.14)  0.063521 
BF5  Max(0, ACCEL1.12)  0.009433 
BF6  Max(0, ACCEL4.24)  0.056731 
BF7  Max(0, ACCEL6.24)  0.066312 
BF8  Max(0, AMBT 21.54)  0.000321 
BF9  Max(0, AMBT 23.15)  0.004433 
BF10  Max(0, AMBT 24.62)  0.037215 
BF11  Max(0, LOAD 15.67)  0.013211 
BF12  Max(0, LOAD 45.67)  0.053412 
BF13  Max(0, Power 13.76)  0.016813 
BF14  Max(0, Power 20.64)  0.021213 
List of basis functions of the BMARS and their coefficients for dynamometer testing
Beta  Basis  Value 

factor  function  
BF0  0.231567  
BF1  Max(0, SPEED7.11)  –0.000134 
BF2  Max(0, SPEED11.42)  0.000514 
BF3  Max(0, SPEED20.16)  0.004671 
BF4  Max(0, SPEED26.11)  0.051411 
BF5  Max(0, ACCEL1.23)  0.009671 
BF6  Max(0, ACCEL5.78)  0.032143 
BF7  Max(0, AMBT 22.14)  0.000221 
BF8  Max(0, AMBT 23.63)  0.003133 
BF9  Max(0, AMBT 25.31)  0.028912 
BF10  Max(0, LOAD 14.41)  0.012761 
BF11  Max(0, LOAD 44.23)  0.041671 
BF12  Max(0, Power 15.72)  0.015116 
BF13  Max(0, Power 21.43)  0.021551 
The effect of speed on NO _{ X } emissions can be explained as follows. By using the onboard measurements method for MARS, if the speed of the vehicle is lower than 8.11 m/s or 29.2 km/h for a short duration in traffic, it has a negligible impact on NO _{ X } emission (indicated by BF0). However, for a longer queuing time, such as in large cities, the amount of NO _{ X } emitted into the atmosphere can be significant as the NO _{ X } emission increases with a corresponding increase in combustion temperature. The NO _{ X } effect is increased as the speed increases from 11.67 m/s or 42 k m/h (indicated by BF2BF5) due to corresponding increase in combustion temperature. The emission rate can reach 0.043903 g/s when the speed is about 24.17 m/s or 82 k m/h (indicated by BF6). This expected finding is consistent with previous findings in literature. From Carslaw et al. 2011, it is noted that NO _{ X } emissions rise and fall in a reverse pattern to hydrocarbon emissions (HC). As the speed of the vehicle increase the mixture becomes leaner with more HC’s at high temperatures in the combustion chamber, there appear excess oxygen molecules which combine with the nitrogen to form NO _{ X }. From Table 1, as the speed increases (indicated by BF2BF6) the total NO _{ X } emission emitted from the tail pipe also increases. Beta factors (BF7BF10) on Table 1 show the nonlinear effect of vehicle acceleration on the NO _{ X } which can be described as fellows. If the vehicle acceleration is less than 0.95 m/s ^{2}, NO _{ X } emission will reduce by 0.0013075 g/s (indicated by BF7), but if the acceleration is increased from 1.25 m/s ^{2}, to 5.85 m/s ^{2}, the NO _{ X } emission will increase by 0.0113075 g/s (indicated by BF8 and BF9). The NO _{ X } emission can reach more than 0.0311017 g/s when the acceleration exceeds 7.21 m/s ^{2}. This result is similar to that of the speed because of depressing the accelerator pedal increase acceleration as well as speed simultaneously.
Comparison of MARS, BMARS and MLR model
Model  RMSE  MSE  R ^{2} 

MARSOBS  0.00016  2.565×10^{−8}  0.87 
MARSDYN  0.00022  4.642×10^{−8}  0.77 
BMARSOBS  0.00011  1.236×10^{−8}  0.93 
BMARSDYN  0.00014  1.905×10^{−8}  0.89 
MLROBS  0.00046  2.571×10^{−8}  0.51 
MLRDYN  0.00048  3.124×10^{−8}  0.50 
Conclusion
In this paper, we have proposed the use of Multivariate Adaptive Regression Splines (MARS) and Boosting Multivariate Adaptive Regression Splines (BMARS) algorithms to effectively estimate vehicular NO _{ X } emissions. The model approximates the nonlinear relationship between the NO _{ X } emission, a function of speed, acceleration, temperature, power and load, considered as predictor variables. The BMARS model is implemented with 14 and 17 piecewiselinear basis functions while the MARS model with 19 and 15 BFs. The model predicts the NO _{ X } emission by forming a weighted sum of the predictor variables; thus, the predicted emission changes in a smooth and regular fashion with respect to the input variations, offering some performance improvements. The results obtained indicate a promising application of the proposed algorithms to accurately estimate NO _{ X } emissions with a reasonable accuracy. The method may usefully assist in a decisionmaking policy regarding urban air pollution.
Declarations
Acknowledgements
The data used for this study were supplied by the Road and Maritime Service, Department of vehicle emission, Compliance Technology & Compliance Operations, NSW Office of Environment & Heritage, and Horiba Australia. Assistance provided by Paul Walker and Thomas Mahsling is gratefully acknowledged.
Authors’ Affiliations
References
 Afotey, B, Sattler, M, Mattingly, SP, Chen, VCP (2013). Statistical model for estimating carbon dioxide emissions from a lightduty gasoline vehicle. Journal of Environmental Protection, 4, 8–15.View ArticleGoogle Scholar
 Carslaw, D, Beevers, S, Westmoreland, E, Williams, M, Tate, J, Murrells, T, Stedman, J, Li, Y, Grice, S, Kent, A, Tsagatakis, I. (2011). Trends in NO _{ X } and NO _{2} Emissions and Ambient Measurements in the UK. King’s College London: University of Leeds Press.Google Scholar
 Duc, H, Azzi, M, Wahid, H, Ha, QP (2013). Background ozone level in the sydney basin: assessment and trend analysis. International Journal of Climatology, 33, 2298–2308.View ArticleGoogle Scholar
 Durbin, T, Johnson, K, Cocker, ID, Miller, J, Maldonado, H, Shah, A, Ensfield, C, Weaver, C, Akard, M, Harvey, N (2007). Evaluation and comparison of portable emissions measurement systems and federal reference methods for emissions from a backup generator and a diesel truck operated on a chassis dynamometer. Environmental Science & Technology, 41, 6199–6204.View ArticleGoogle Scholar
 Freund, Y, & Schapire, R (1997). A decisiontheoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.MATHMathSciNetView ArticleGoogle Scholar
 Frey, HC, Unal, A, Rouphail, NM, Colyar, JD (2002). Use of onboard tailpipe emissions measurements for development of mobile source emission factors. In Proceedings of US Environmental Protection Agency Emission Inventory Conference, Atlanta, April, (pp. 1–13).Google Scholar
 Frey, HC, Unal, A, Rouphail, NM (2003). Onroad measurement of vehicle tailpipe emissions using a portable instrument. Journal of the Air and Waste Management Association, 53, 992–1002.View ArticleGoogle Scholar
 Friedman, JH (1991). Multivariate adaptive regression splines. Annals of Statistics, 19, 1–141.MATHMathSciNetView ArticleGoogle Scholar
 Hastie, T, Tibshirani, R, Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction, (pp. 337–343). Stanford, California: SpringerVerlag.View ArticleGoogle Scholar
 Jerome, HF (2001). Greedy function approximation: A gradient boosting mac. In The Annals of Statistics, (Vol. 29. Institute of Mathematical Statistics, Chapman and Hall, pp. 1189–1232).Google Scholar
 Maykut, NN, Lewtas, J, Kim, E, Larson, TV (2007). Source apportionment of pm 2.5 at an urban improve site in seattle, washington. Environmental Science and Technology, 37, 5135–5142.View ArticleGoogle Scholar
 Nine, RD, Clark, N, Daley, JJ, Atkinson, CM (1999). Development of a heavyduty chassis dynamometer driving route. Journal of Automobile Engineering, 213, 561–574.View ArticleGoogle Scholar
 Oduro, SD, Metia, S, Duc, H, Ha, QP (2013). CO _{2} vehicular emission statistical analysis with instantaneous speed and acceleration as predictor variables. In The 2nd International Conference on Control, Automation and Information Sciences. Nha Trang, Vietnam, (pp. 158–163).Google Scholar
 Oduro, SD, Metia, S, Duc, H, Hong, G, Ha, QP (2014). Prediction of no _{ x } vehicular emissions using onboard measurement and chassis dynamometer testing. In Proceedings of The 31st International Symposium on Automation and Robotics in Construction and Mining. University of Technology, Sydney, Sydney Australia, (pp. 584–591).Google Scholar
 Pierson, WR, Gertler, AW, Robinson, NF, Sagebiel, JC, Zielinska, B, Bishop, AW, Stedman, DH, Zweidinger, RB, Ray, WD (1996). Real world automotive emissions summary of studies in the fort mchenry and tuscarora mountain tunnels. Atmospheric Environment, 30, 2233–2256.View ArticleGoogle Scholar
 Put, R, Xu, Q, Massart, D, Heyden, Y (2004). Multivariate adaptive regression splines (mars) in chromatographic quantitative structureretention relationship studies. Journal of Chromatography, 1055, 11–19.View ArticleGoogle Scholar
 Querol, X, Viana, M, Alastuey, A, Amato, F, Moreno, T, Castillo, S, Pey, J, de la Rosa, J, Sánchez de la Campa, A, Artíñano, B, Salvador, P, García Dos Santos, S, FernándezPatier, R, MorenoGrau, S, Negral, L, Minguillón, MC, Monfort, E, Gil, JI, Inza, A, Ortega, LA, Santamaría, JM, Zabalzah, J (2007). Source origin of trace elements in pm from regional background, urban and industrial sites of spain. Atmospheric Environment, 44, 7219–7231.View ArticleGoogle Scholar
 Rakha, H, Ahn, K, Moran, K, Saerens, B, den Bulck E, V (2011). Simple Comprehensive Fuel Consumption and CO2 Emissions Model Based on Instantaneous Vehicle Power. In Transportation Research Board 90th Annual Meeting, Washington DC, 2327 January 2011, Paper No. 111009.Google Scholar
 Sharma, AR, Kharol, SK, Badarinath, KVS (2010). Influence of vehicular traffic on urban air quality – a case study of hyderabad, india. Transportation Research Part D: Transport and Environment, 15, 154–159.View ArticleGoogle Scholar
 TóthNagy, C, Conley, JJ, Jarrett, RP, Clark, NN (2006). Further validation of artificial neural networkbased emissions simulation models for conventional and hybrid electric vehicles. Journal of the Air & Waste Management Association, 56, 898–910.View ArticleGoogle Scholar
 European Commission White Paper. Road to a Single European Transport Area Towards a Competitive and Resource Efficient Transport System, Brussels", COM, 144 Final. http://ec.europa.eu/transport/strategies/2011_white_paper_en.htm.
 MARS User Guide:San Diego, Salford System. http://www.salfordsystems.com.
 World Health Organization Fact Sheet No 313 “Air Quality and Health Updated". http://www.who.int/mediacentre/factsheets/fs313/en/.
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.