Multivariate adaptive regression splines models for vehicular emission prediction

Oduro, Seth Daniel; Metia, Santanu; Duc, Hiep; Hong, Guang; Ha, Q.P.

doi:10.1186/s40327-015-0024-4

Research
Open access
Published: 10 June 2015

Multivariate adaptive regression splines models for vehicular emission prediction

Seth Daniel Oduro¹,
Santanu Metia¹,
Hiep Duc²,
Guang Hong¹ &
…
Q.P. Ha¹

Visualization in Engineering volume 3, Article number: 13 (2015) Cite this article

6054 Accesses
20 Citations
Metrics details

Abstract

Background

Rate models for predicting vehicular emissions of nitrogen oxides (NO _X) are insensitive to the vehicle modes of operation, such as cruise, acceleration, deceleration and idle, because these models are usually based on the average trip speed. This study demonstrates the feasibility of using other variables such as vehicle speed, acceleration, load, power and ambient temperature to predict (NO _X) emissions to ensure that the emission inventory is accurate and hence the air quality modelling and management plans are designed and implemented appropriately.

Methods

We propose to use the non-parametric Boosting-Multivariate Adaptive Regression Splines (B-MARS) algorithm to improve the accuracy of the Multivariate Adaptive Regression Splines (MARS) modelling to effectively predict NO _X emissions of vehicles in accordance with on-board measurements and the chassis dynamometer testing. The B-MARS methodology is then applied to the NO _X emission estimation.

Results

The model approach provides more reliable results of the estimation and offers better predictions of NO _X emissions.

Conclusion

The results therefore suggest that the B-MARS methodology is a useful and fairly accurate tool for predicting NO _X emissions and it may be adopted by regulatory agencies.

Background

Outdoor air pollution is reported as the main reason to cause 1.3 million annual deaths worldwide (World Health Organization 2015). Among air pollutants coming from natural effects (Duc et al. 2013), man-made emissions have been the main concern in air-quality modelling and control. Vehicular emissions, in this context, can bring serious impacts on air quality and thus, have received increasing research attention (Sharma et al. 2010). Road transport often appears as the single most important source of urban pollutant emissions in source apportionment studies (Maykut et al. 2007). In the coming decades, road transport is likely to remain a large contributor to air pollution, especially in urban areas.

For this reason, major efforts are being made for the reduction of polluting emissions from road transport. These include new powertrains and vehicle technology improvements, fuel refinements, optimization of urban traffic management and the implementation of tighter emission standards (Querol et al. 2007). In recent decades, many emission models have been developed. Afotel et al. 2013 proposed regression models to estimate light-duty gasoline vehicle emissions of CO ₂ based on vehicle velocity, acceleration, deceleration, power demand and time of the day. However, the model did not include NO _X emissions. Oduro et al. 2013 proposed multiple regression models with instantaneous speed and acceleration as a predictor variables to estimate vehicular emissions of CO ₂ but not NO _X. Tóth-Nagy et al. 2006 proposed an artificial neural network-based model for predicting emissions of CO and NO _X from heavy-duty diesel conventional and hybrid vehicles. The methodology sounds promising, but applied to heavy-duty vehicles only, and the fit function contains many details which make the model difficult to understand. Emission model based on instantaneous vehicle power, which is computed on total resistance force, vehicle mass, acceleration, velocity, and drive-line efficiency, was developed by Rakha et al. 2011. However, the model applies for fuel consumption and CO ₂ emission factor and does not include the NO _X emission.

A key gap in our understanding of these emissions is the effect of changes in vehicle speed, power and load on average emission rates for the on-road vehicle fleet. Vehicle power, load and vehicle speed are closely linked to fuel consumption and pollutant emission rates (European Commission White Paper 2015). Improved understanding of the link between operating conditions and emissions could develop accurate models for prediction of vehicle emissions. The quality of the application of any road vehicle emission model largely depends on the representativeness of the emission factor such as carbon dioxide (CO ₂), carbon monoxide (CO), nitrogen oxides (NO _X), volatile organic compounds (VOCs) and particulate matter (PM). This refers to the accuracy with which the emission factor can describe the actual emission level of a particular vehicle type and driving conditions applied to it.

This work focuses on using the MARS methodology to improve the prediction accuracy of chassis dynamometer and on-board measurement systems. The dynamometer testing is one of the three typical vehicle tailpipe emission measurement methods, where emissions from vehicles are measured under laboratory conditions during a driving cycle to simulate vehicle road operations (Frey et al. 2003). The real world on-board emissions measurement is widely recognized as a desirable approach for quantifying emissions from vehicles since data are collected under real-world conditions at any location travelled by the vehicle (Durbin et al. 2007). Variability in vehicle emissions as a result of changes in facility (roadway) characteristics, vehicle location, vehicle operation, driver, or other factors can be represented and analysed more reliably than with the other methods (Frey et al. 2002). This is because measurements are obtained during real world driving, eliminating the concern about non representativeness that is often an issue with dynamometer testing, and at any location, eliminating the setting restrictions inherent in remote sensing. Though this measuring technique seems to be more promising, the need to improve the prediction accuracy of emission factor especially with NO _X emissions by using effective statistical techniques is important in any emission inventory.

A number of the models discussed above either do not estimate NO _X emissions, or are so sophisticated as to require excessive data inputs. There needs to be a balance between the accuracy and detail of a model for its ease of application. Therefore, to enhance the prediction performance for the NO _X emissions, the boosting MARS (B-MARS) modelling approach is proposed in this paper. Here, we aim to estimate, with high accuracy, the NO _X emissions. The effectiveness of the model is then determined by grouping the data into two parts, one for building the model (learning) and the other for validating the model (testing). The results are verified by comparing the experimental data, B-MARS and MARS predicted values. The remainder of this paper presents the data collection methods, namely via chassis dynamometer and on-board data collection, the MARS model and B-MARS methods.

Data collection methods

Chassis dynamometer

This study uses secondary data corrected by the New South Wales (NSW) Road and Maritime Service (RMS), Department of Vehicle Emission, Compliance Technology Operation. The data were collected on the second by second basis and four vehicles were used for the test. The test vehicles include Toyota, Ford, Holden and Nissan from 2007 and 2008 model year with an engine displacement ranging from 1.8L to 2.0L. A chassis dynamometer set-up in the laboratory simulates the resistive power imposed on the wheels of a vehicle, as shown in Fig. 1. It consists of a dynamometer that is coupled to drive lines that are directly connected to the wheel hubs of the vehicle, or to a set of rollers upon which the vehicle is placed, and which can be adjusted to simulate driving resistance. During testing, the vehicle is tied down so that it remains stationary as a driver operates it according to a predetermined time-speed profile and gear change pattern shown on a monitor. A driver operates the vehicle to match the speed required at the different stages of the driving cycle (Nine et al. 1999). Experienced drivers are able to closely match the established speed profile.

On-board data collection

Data from on-board instruments, can facilitate development of micro-scale emission models (Frey et al. 2003). When compared with conventional dynamometer tests under carefully controlled conditions, on-road data reflects real driving situations. Accordingly, second-by-second emissions data were collected using a Horiba On-Board Measurement System (OBS-2000), as shown in Fig. 2, with the same testing vehicles as with the dynamometer test cycle. The equipment is composed of two on-board gas analysers, a laptop computer equipped with data logger software, a power supply unit, a tailpipe attachment and other accessories. The OBS-2000 collects second-by-second measurements of nitrogen oxides NO _X, hydrocarbons (HC), carbon monoxide (CO), carbon dioxide (CO ₂), exhaust temperature, exhaust pressure, and vehicle position (via a global positioning system, or GPS). Although the instrument measured other pollutants, the focus of this work was to build a model for NO _X emissions. For the measurement scale used, accuracy for the NO _X emission measurements, reported in percentage, was ±0.3 %. A two second lag in NO _X emission measurement was accounted for in the data spreadsheets. NO _X sensor calibration was carried out throughout the data collection period. To ensure consistently smooth and good data collection without frequent interruptions due to any possible unit malfunction, inability of batteries to stay charged and calibration issues throughout the period, proper maintenance and diagnostic procedures were strictly followed.

Methods

The Multivariate Adaptive Regression Splines (MARS) were introduced for fitting the relationship between a set of predictors and dependent variables (Friedman 1991). MARS is a multivariate, piecewise regression technique that can be used to model complex relationship. The space of predictors is divided into multiple knots in order to fit a spline function between these knots (Friedman 1991; MARS User Guide 2015). The basic problem in vehicular emission modelling is how best to determine the fundamental relationship between dependent variables, and vector of predictors, such as speed, acceleration, load, power, ambient temperature including other factors.

The MARS algorithm searches over all possible univariate hinge locations and across interactions among all variables. It does so through the use of combinations of variable called basis functions. The approach is analogous to the use of splines. This study aims at exploring the potential of applying the MARS methodology to model NO _X emissions using the following set of input parameters: speed, acceleration, load, power and ambient temperature of chassis dynamometer and on-board emission measurements. The problem can be stated as a multivariate regression problem. Suppose that N pairs of input-output parameters are available: $\{y_{i},x_{1i},\cdots x_{\textit {mi}}\}^{N}_{1}$, where the depend variable y _i, i=1,2⋯,N, is the ith measure of NO _X and the predictor x _li, i=1,2⋯,N, l=1,2,⋯,m, is the ith measure of the lth parameter. We assume that the data $\{y_{i},x_{1i},\cdots x_{m}\}^{N}_{1}$ are related through the following equation

$$ y=f(x_{1},\cdots,x_{m}),\enskip (x_{1},\cdots,x_{m})\in D \subset R^{m}, $$

((1))

where f(·) is an unknown multivariate deterministic function and D is the domain of inputs. Since the true mapping in (1) is not known, it is desired to have a function $\hat {f}(x_{1},\cdots,x_{m})$ that provides a “good" fit approximation of the output data. The good fit between $\hat {f}(x_{1},\cdots,x_{m})$ and the output data is using the integrated mean square error (M S E) estimated.

$$ MSE=\frac{1}{N}\sum\limits_{i=1}^{N}\left[y_{i} -\hat{f}(x_{1i}, \cdots x_{mi}) \right]^{2}. $$

((2))

To regularize the problem, that is, make it well-posed, a restriction is imposed for the solution $\hat {f}(x_{1},\cdots,x_{m})$ as functions residing in the linear space:

$$ f(\cdot)=\beta_{o}+\sum\limits_{m=1}^{M}\beta_{m}h_{m}(\cdot), $$

((3))

where $\{h_{m}(\cdot)\}^{M}_{m=1}$ is a set of basis functions and $\{\beta _{m}\}^{M}_{m=0}$ are coefficients of representation. In this paper, h _m(·) is the splines basis function defined as:

$$ h_{m}(\cdot)=\prod\limits_{k=1}^{K_{m}}\left[s_{k,m}\cdot(x_{v(k,m)}-t_{k,m})\right]_{+}, $$

((4))

where s _k,m are variables that take values ±1, v(k,m) labels the predictor variables and t _k,m represents estimated values on the corresponding variables. The quantity K _m is the number of “splits" that give rise to each basis function β _m. Here the subscript “+" indicates a value of zero for negative values of the argument. The basis functions involved in (1) are known as “hockey sticks" basis function. MARS searches over the space of all inputs and predictors values (knots) as well as interactions between variables. Now, given the estimated coefficients $\{\beta ^{*}_{m}\}^{M}_{0}$, basis functions $\{h^{*}_{m}(\cdot)\}^{M}_{0}$ and operation parameters describing a new measurement, the emission of the new measurement can be predicted by taking the following steps:

1.
Segregate operation parameters including speed, acceleration, power, load and ambient temperature from the raw data.
2.
Predict the emission NO _X by using the approximate function $\hat {f}(\cdot)$ with $\{\beta ^{*}_{m}\}^{M}_{0}$ and $\{h^{*}_{m}(\cdot)\}^{M}_{0}$, that is $\hat {f}(x_{1i} \cdots x_{\textit {mi}})= \beta ^{*}_{0}+\sum \limits _{m=1}^{M}\beta ^{*}_{m}h^{*}_{m}(x_{1i} \cdots x_{\textit {mi}}), i=1\cdots N$, where $\{x_{1i}\cdots x_{\textit {mi}}\}^{N}_{1}$ are from new measurements. The basis functions, together with the model parameters, are combined to produce the predictions given the inputs. The general MARS model equation is given as:
$$ \hat{f}(X)= \beta_{0}+\sum\limits_{m=1}^{M}\beta_{m}h_{m}(X), $$
((5))

where $\{\beta \}^{m}_{0}$ are the coefficients of the model that are estimated to yield the best fit to the data, M is the number of sub-regions or the number of basis functions in the model, and h _m(X) is the spline basis function given in (4).

This model searches over the space of all inputs and predictor values (referred to as “knots") as well as the interactions between variables. During this search, an increasingly larger number of basis functions are added to the model to minimize a lack-of-fit criterion. As a result of these operations, MARS automatically determines the most important independent variables as well as the most significant interactions among them. It is noted that the search for the best predictor and knot location is performed in an iterative process. The predictors as well as the knot location, having the most contribution to the model, are selected first. Also, at the end of each iteration, the introduction of an interaction is checked for possible model improvements.

Model selection and pruning

In general, non-parametric models are adaptive and can exhibit a high degree of flexibility that may ultimately result in over fitting, if no measures are taken to counteract it. The second step is the pruning step, where a “one-at-a-time" backward deletion procedure is applied in which the basis functions with the least contribution to the model are eliminated. This pruning is based on a generalized cross-validation (GCV) criterion. The GCV criterion is used to find the overall best model from a sequence of fitted models, where a larger GCV value tends to produce a smaller model, and vice versa. The GCV criterion is estimated as the lack-of-fit criterion (Hastie and Tibshirani 2001).

$$ \text{GCV}=\frac{1}{N}\frac{\sum\limits_{i=1}^{N}\left(y_{i}-\hat{f}(X_{i})\right)^{2}}{\left[1-\frac{\tilde{C}(M)}{N}\right]^{2}}, $$

((6))

where $\left [1-\frac {\tilde {C}(M)}{N}\right ]^{2}$ is a complexity function, and $\tilde {C}(M)$ is defined as $\tilde {C}(M) = C(M)+dM$, of which C(M) is the number of parameters being fit and d represents a cost for each basis function optimization and is a smoothing parameter of the procedure. The higher the cost d is, the more basis functions will be eliminated (Put et al. 2004).

Boosting algorithm

To extend the results obtained in (Oduro et al. 2014), we propose to use a boosting algorithm for improving the performance of MARS model. The algorithm, introduced by Freund and Schapire 1997, has been successfully applied to several benchmark machine learning problems using supervised learning. Basically, boosting is an algorithm to form a strong learner by combining multiple weak learners whereby a new classifier is generated based on the result of the previously generated classifiers focusing on misclassified samples. The algorithm increases the weights of incorrect classified samples and decreases the weights of those classified correctly. The problem of applying least-square boosting (LS-Boosting) can be formulated as fellows. Let x denote the feature vector and y the alignment accuracy. Given an input variable x, a response variable y and some samples $\{y_{i},x_{i}\}^{N}_{i=1}$, the goal is to obtain an estimated or approximation $\hat {F}(x)$, of the function F ^∗(x) mapping x to y, that minimises the expected value of some specified loss function (L(y,F(x)) over the joint distribution of all (y,x) values.

$$ F^{*}=\arg\min_{F} L(y,F(x)).\ $$

((7))

In least squares (LS) boosting, the squared error loss is given by L(y,F)=(y−F)²/2 and the pseudo-response is obtained as

$${\fontsize{8.5pt}{12pt}\selectfont{ \begin{aligned} {} \tilde{y} = -\left[\frac{\partial L(y_{i},F(x_{i}))}{\partial F(x_{i})}\right]_{F(x)=F_{m-1}(x)} ={y_{i}-F_{m-1}(x)}, \quad i=1,2,..., N. \end{aligned}}} $$

((8))

Thus, for i=1,2,...,N the minimisation of the data based estimate of the expected loss gives

$$ (\rho_{m}, a_{m})= \arg\min_{a,\rho}\sum_{i=1}^{N}\left[\tilde{y_{i}}-\rho h(x_{i}; a)\right]^{2}, $$

((9))

where h(x;a) is the weak learner or base learner with basis functions $\{{h(x, a_{m})}\}^{M}_{m=1}$ and ρ _m is the corresponding multiplier. The LS-boost algorithm is summarised below (Jerome 2001).

1.
Initialize $F_{0}(x)=\bar {y}$.
2.
For m=1 to M do:
- Compute
  $$ \tilde{y_{i}}=y_{i}-F_{m-1}(x_{i}),\quad i=1, N. $$
  ((10))
- Compute
  $$ (\rho_{m}, a_{m})= \arg\min_{a,\rho}\sum_{i=1}^{N}[\tilde{y_{i}}-\rho h(x_{i}; a) ]^{2}. $$
  ((11))
- The Update estimator at step m becomes
  $$ F_{m}(x)=F_{m-1}(x)+\rho_{m}h(x; a_{m}).\ $$
  ((12))
3.
End for
4.
Output the final regression function F _m(x).

The flowchart in Fig. 3 shows the proposed models for MARS and B-MARS, whereby boosting in the latter is adopted to improve estimation performance by adjusting the weights of the classifiers.

Results and discussions

Five vehicular emission predictor variables, namely, speed (m/s), acceleration (m/s ²), power (W), temperature (° C) and load (Nm) were used with the response variable of NO _X (g/s) in an attempt to identify the relationships that vehicular emission models developers wish to understand. To explore factors affecting vehicular emission models, the present study provides results and some interpretations from the MARS and B-MARS models. Tables 1, 2, 3 and 4 summarize the variable selection results using MARS and B-MARS, whose beta factor coefficients β _m are denoted B F _m. In a MARS and B-MARS models, basis functions are used to predict the effects of independent variables on NO _X emission factor. The interpretation of B-MARS and MARS results is similar to but not as straight forward as that of classical linear regression models. A positive sign for the estimated beta factors for the basis function indicates increased NO _X emission, while a negative sign indicates the opposite. The value of beta factor implies the magnitude of effect of the basis function (i.e., variable effect) on the NO _X emission. For the effect of each basis function, max (0,x−t) is equal to (x−t) when x is greater than t; otherwise the basis function is equal to zero. As shown in Tables 1, 2, 3 and 4, the MARS and B-MARS models contain 14, 15, 17 and 19 basis functions for on-board and dynamometer testing. The on-board measurements and dynamometer testing for MARS and B-MARS have similar interpretations. It can be observed that all the five predictor variables play crucial roles in determining NO _X vehicle emission. From Table 1, beta factors BF1, BF2, BF3, BF4, BF5 and BF6 account for the nonlinear effect of vehicle speed in the emission model.

Table 1 List of basis functions of the MARS and their coefficients for on-board measurements

Full size table

Table 2 List of basis functions of the B-MARS and their coefficients for on-board measurements

Full size table

Table 3 List of basis functions of the MARS and their coefficients for dynamometer testing

Full size table

Table 4 List of basis functions of the B-MARS and their coefficients for dynamometer testing

Full size table

The effect of speed on NO _X emissions can be explained as follows. By using the on-board measurements method for MARS, if the speed of the vehicle is lower than 8.11 m/s or 29.2 km/h for a short duration in traffic, it has a negligible impact on NO _X emission (indicated by BF0). However, for a longer queuing time, such as in large cities, the amount of NO _X emitted into the atmosphere can be significant as the NO _X emission increases with a corresponding increase in combustion temperature. The NO _X effect is increased as the speed increases from 11.67 m/s or 42 k m/h (indicated by BF2-BF5) due to corresponding increase in combustion temperature. The emission rate can reach 0.043903 g/s when the speed is about 24.17 m/s or 82 k m/h (indicated by BF6). This expected finding is consistent with previous findings in literature. From Carslaw et al. 2011, it is noted that NO _X emissions rise and fall in a reverse pattern to hydrocarbon emissions (HC). As the speed of the vehicle increase the mixture becomes leaner with more HC’s at high temperatures in the combustion chamber, there appear excess oxygen molecules which combine with the nitrogen to form NO _X. From Table 1, as the speed increases (indicated by BF2-BF6) the total NO _X emission emitted from the tail pipe also increases. Beta factors (BF7-BF10) on Table 1 show the nonlinear effect of vehicle acceleration on the NO _X which can be described as fellows. If the vehicle acceleration is less than 0.95 m/s ², NO _X emission will reduce by 0.0013075 g/s (indicated by BF7), but if the acceleration is increased from 1.25 m/s ², to 5.85 m/s ², the NO _X emission will increase by 0.0113075 g/s (indicated by BF8 and BF9). The NO _X emission can reach more than 0.0311017 g/s when the acceleration exceeds 7.21 m/s ². This result is similar to that of the speed because of depressing the accelerator pedal increase acceleration as well as speed simultaneously.

The ambient temperature is also found to influence the NO _X emission as indicated by BF11, BF12 and BF13 of Table 1, the effects of ambient temperature on NO _X emission occurrence include: (1) if the ambient temperature is less than 22.12 °C then it has no effect on vehicle NO _X emission (indicated by BF11); (2) if the ambient temperature is greater than 22.12 °C but less 23.47 °C, NO _X emission will increase by 0.00023075 g/s for 1 °C increase of ambient temperature (indicated by BF11 and BF12); (3) if the ambient temperature is greater than 23.47 °C but less than 24.76 °C, the vehicle NO _X emission will increase by 0.00313022 g/s for 1 °C increase in ambient temperature (indicated by BF12 and BF13) and (4) if the ambient temperature is greater than 24.76 °C the NO _X emission will increase by 0.02113075 g/s for 1 °C increase in ambient temperature (indicated by BF13). The higher ambient temperature resulting in more vehicle NO _X emission is expected, because NO _X is formed in a larger quantity in the cylinder as the combustion temperature exceeds the required limit. This finding is also consistent with previous explanation. In addition, temperatures greater than 24.76 °C (B13) will significantly produce NO _X emissions. As indicated by BF14, BF15 and BF16, the MARS results show the effect of load: (1) if the load is less than 10.53 Nm, then it has no effect on NO _X emission (indicated by BF14); (2) if the load is greater than 10.53 Nm but less 52.34 Nm, the NO _X emission will increase by 0.01561811 g/s for 1 Nm increase of load (indicated by BF14 and BF15); (3) if the load is greater than 52.34 Nm but less than 60.15 Nm, the vehicle NO _X emission will increase by 0.0179656 g/s for 1 Nm increase in load (indicated by BF15 and BF16) and (4) if the load is greater than 60.15 Nm the NO _X emission will increase by 0.02324571 g/s for 1 Nm increase in load (indicated by BF16). As far as the effect of power on NO _X emission, BF17 and BF18 indicate that the occurrence can be described as: (1) if the power is less than 8.98 W, then it has no effect on vehicle NO _X emission (indicated by BF17); (2) if the power is greater than 8.98 W but less 21.32 W, NO _X emission will increase by 0.01567893 g/s for 1 W increase of power (indicated by BF17 and BF18); (3) if the power is greater than 21.23 W, the vehicle NO _X emission will increase by 0.01567893 g/s for 1 W increase in power (indicated by BF18). The NO _X emission as a result of the increasing load and power is expected, following the remark by Pierson et al. 1996 that driving a vehicle against a higher resistance will increase the engine load and power which will result in increases of the carbon dioxide (CO ₂) and NO _X emissions. To illustrate the NO _X emission during real-world driving conditions and the dynamometer testing drive cycle, Figs. 4 and 5 show the MARS model that has the best performance basis on independent test samples. There were 556 data points used in the analysis, 65 % of which for building the model (Learn) and 35 % for validation (Test).

The on-board system models have seventeen and nineteen basis functions with the best model with the least mean square error occurring at 15 ^th basis function for B-MARS and 17 ^th for MARS, R values of 97 % and 93 % for B-MARS and MARS while the chassis dynamometer testing gave R of 94 % and 88 % for B-MARS and MARS models with the best model occurring at BF11 and 12. As shown in Figs. 6, 7, 8 and 9, the regression correlation coefficient R of the selected models clearly demostrate strong positive correlation in the NO _X emissions model. In Table 5, we try to compare the MARS, B-MARS and Multiple Linear Regression (MLR) models. We learn that B-MARS obtained the best results and high accuracy with good R ² values of 93 % and 89 % respectively for on-board and dynamometer testing while the MARS model obtained R ² of 87 % for the on-board and 77 % for the dynamometer testing. However, the MLR model shows the least performance with R ² values of 51 % and 50 % for both the on-board and the dynamometer test. The improved performance of the B-MARS algorithm was clearly demonstrated with low root mean square error (RMSE) of 0.00011 and 0.00014 and mean square error of (MSE) 1.236×10⁻⁸ and 1.905×10⁻⁸ as compared with MARS, RMSE of 0.00016 and 0.00022, MSE 2.565×10⁻⁸ and 4.642×10⁻⁸ for the on-board and dynamometer testing. The contribution achieved by the boosting algorithm confirms its ability not only improving the prediction accuracy of the NO _X emission but also perform better in process. Figures 10, 11, 12, 13, 14 and 15 provide a detailed plot for comparing the experimental data with the predicted B-MARS and MARS techniques. Note that the predicted emissions follow the experimental data with sufficiently good precision with the B-MARS proving very strong in the NO _X emissions prediction. This suggests the robustness of the B-MARS algorithm and its capability of improving the accuracy of the MARS model in NO _X emissions prediction.

Table 5 Comparison of MARS, B-MARS and MLR model

Full size table

Conclusion

In this paper, we have proposed the use of Multivariate Adaptive Regression Splines (MARS) and Boosting Multivariate Adaptive Regression Splines (B-MARS) algorithms to effectively estimate vehicular NO _X emissions. The model approximates the nonlinear relationship between the NO _X emission, a function of speed, acceleration, temperature, power and load, considered as predictor variables. The B-MARS model is implemented with 14 and 17 piecewise-linear basis functions while the MARS model with 19 and 15 BFs. The model predicts the NO _X emission by forming a weighted sum of the predictor variables; thus, the predicted emission changes in a smooth and regular fashion with respect to the input variations, offering some performance improvements. The results obtained indicate a promising application of the proposed algorithms to accurately estimate NO _X emissions with a reasonable accuracy. The method may usefully assist in a decision-making policy regarding urban air pollution.

References

Afotey, B, Sattler, M, Mattingly, SP, Chen, VCP (2013). Statistical model for estimating carbon dioxide emissions from a light-duty gasoline vehicle. Journal of Environmental Protection, 4, 8–15.
Article Google Scholar
Carslaw, D, Beevers, S, Westmoreland, E, Williams, M, Tate, J, Murrells, T, Stedman, J, Li, Y, Grice, S, Kent, A, Tsagatakis, I. (2011). Trends in NO _X and NO ₂ Emissions and Ambient Measurements in the UK. King’s College London: University of Leeds Press.
Google Scholar
Duc, H, Azzi, M, Wahid, H, Ha, QP (2013). Background ozone level in the sydney basin: assessment and trend analysis. International Journal of Climatology, 33, 2298–2308.
Article Google Scholar
Durbin, T, Johnson, K, Cocker, ID, Miller, J, Maldonado, H, Shah, A, Ensfield, C, Weaver, C, Akard, M, Harvey, N (2007). Evaluation and comparison of portable emissions measurement systems and federal reference methods for emissions from a back-up generator and a diesel truck operated on a chassis dynamometer. Environmental Science & Technology, 41, 6199–6204.
Article Google Scholar
Freund, Y, & Schapire, R (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Article MATH MathSciNet Google Scholar
Frey, HC, Unal, A, Rouphail, NM, Colyar, JD (2002). Use of on-board tailpipe emissions measurements for development of mobile source emission factors. In Proceedings of US Environmental Protection Agency Emission Inventory Conference, Atlanta, April, (pp. 1–13).
Frey, HC, Unal, A, Rouphail, NM (2003). On-road measurement of vehicle tailpipe emissions using a portable instrument. Journal of the Air and Waste Management Association, 53, 992–1002.
Article Google Scholar
Friedman, JH (1991). Multivariate adaptive regression splines. Annals of Statistics, 19, 1–141.
Article MATH MathSciNet Google Scholar
Hastie, T, Tibshirani, R, Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction, (pp. 337–343). Stanford, California: Springer-Verlag.
Book Google Scholar
Jerome, HF (2001). Greedy function approximation: A gradient boosting mac. In The Annals of Statistics, (Vol. 29. Institute of Mathematical Statistics, Chapman and Hall, pp. 1189–1232).
Google Scholar
Maykut, NN, Lewtas, J, Kim, E, Larson, TV (2007). Source apportionment of pm 2.5 at an urban improve site in seattle, washington. Environmental Science and Technology, 37, 5135–5142.
Article Google Scholar
Nine, RD, Clark, N, Daley, JJ, Atkinson, CM (1999). Development of a heavy-duty chassis dynamometer driving route. Journal of Automobile Engineering, 213, 561–574.
Article Google Scholar
Oduro, SD, Metia, S, Duc, H, Ha, QP (2013). CO ₂ vehicular emission statistical analysis with instantaneous speed and acceleration as predictor variables. In The 2nd International Conference on Control, Automation and Information Sciences. Nha Trang, Vietnam, (pp. 158–163).
Google Scholar
Oduro, SD, Metia, S, Duc, H, Hong, G, Ha, QP (2014). Prediction of no _x vehicular emissions using on-board measurement and chassis dynamometer testing. In Proceedings of The 31st International Symposium on Automation and Robotics in Construction and Mining. University of Technology, Sydney, Sydney Australia, (pp. 584–591).
Google Scholar
Pierson, WR, Gertler, AW, Robinson, NF, Sagebiel, JC, Zielinska, B, Bishop, AW, Stedman, DH, Zweidinger, RB, Ray, WD (1996). Real world automotive emissions summary of studies in the fort mchenry and tuscarora mountain tunnels. Atmospheric Environment, 30, 2233–2256.
Article Google Scholar
Put, R, Xu, Q, Massart, D, Heyden, Y (2004). Multivariate adaptive regression splines (mars) in chromatographic quantitative structure-retention relationship studies. Journal of Chromatography, 1055, 11–19.
Article Google Scholar
Querol, X, Viana, M, Alastuey, A, Amato, F, Moreno, T, Castillo, S, Pey, J, de la Rosa, J, Sánchez de la Campa, A, Artíñano, B, Salvador, P, García Dos Santos, S, Fernández-Patier, R, Moreno-Grau, S, Negral, L, Minguillón, MC, Monfort, E, Gil, JI, Inza, A, Ortega, LA, Santamaría, JM, Zabalzah, J (2007). Source origin of trace elements in pm from regional background, urban and industrial sites of spain. Atmospheric Environment, 44, 7219–7231.
Article Google Scholar
Rakha, H, Ahn, K, Moran, K, Saerens, B, den Bulck E, V (2011). Simple Comprehensive Fuel Consumption and CO2 Emissions Model Based on Instantaneous Vehicle Power. In Transportation Research Board 90th Annual Meeting, Washington DC, 23-27 January 2011, Paper No. 11-1009.
Sharma, AR, Kharol, SK, Badarinath, KVS (2010). Influence of vehicular traffic on urban air quality – a case study of hyderabad, india. Transportation Research Part D: Transport and Environment, 15, 154–159.
Article Google Scholar
Tóth-Nagy, C, Conley, JJ, Jarrett, RP, Clark, NN (2006). Further validation of artificial neural network-based emissions simulation models for conventional and hybrid electric vehicles. Journal of the Air & Waste Management Association, 56, 898–910.
Article Google Scholar
European Commission White Paper. Road to a Single European Transport Area -Towards a Competitive and Resource Efficient Transport System, Brussels", COM, 144 Final. http://ec.europa.eu/transport/strategies/2011_white_paper_en.htm.
MARS User Guide:San Diego, Salford System. http://www.salford-systems.com.
World Health Organization Fact Sheet No 313 “Air Quality and Health Updated". http://www.who.int/mediacentre/factsheets/fs313/en/.

Download references

Acknowledgements

The data used for this study were supplied by the Road and Maritime Service, Department of vehicle emission, Compliance Technology & Compliance Operations, NSW Office of Environment & Heritage, and Horiba Australia. Assistance provided by Paul Walker and Thomas Mahsling is gratefully acknowledged.

Author information

Authors and Affiliations

Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, Sydney, NSW 2007, Australia
Seth Daniel Oduro, Santanu Metia, Guang Hong & Q.P. Ha
Office of Environment and Heritage, Lidcombe, Sydney, NSW 1825, Australia
Hiep Duc

Authors

Seth Daniel Oduro
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Metia
View author publications
You can also search for this author in PubMed Google Scholar
Hiep Duc
View author publications
You can also search for this author in PubMed Google Scholar
Guang Hong
View author publications
You can also search for this author in PubMed Google Scholar
Q.P. Ha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seth Daniel Oduro.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors contributed equally and significantly in writing this article. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Oduro, S.D., Metia, S., Duc, H. et al. Multivariate adaptive regression splines models for vehicular emission prediction. Vis. in Eng. 3, 13 (2015). https://doi.org/10.1186/s40327-015-0024-4

Download citation

Received: 21 January 2014
Accepted: 23 March 2015
Published: 10 June 2015
DOI: https://doi.org/10.1186/s40327-015-0024-4

Multivariate adaptive regression splines models for vehicular emission prediction

Abstract

Background

Methods

Results

Conclusion

Background

Data collection methods

Chassis dynamometer

On-board data collection

Methods

Model selection and pruning

Boosting algorithm

Results and discussions

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords