Spline Semiparametric Regression Models

In this paper, we study semiparametric regression models with spline smoothing, and determining the numbers of knots and their locations by using some statistical criteria, a simulation model has been performed.


Introduction
Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable can be predicted from the other, or others.For example, if one knows the relation between advertising expenditures and sales, one can predict sales by regression analysis once the level of advertising expenditures has been set.
Linear regression is a statistical modeling technique that relates the change in one variable to other variables ( see [9]).A simple linear regression line has an equation of the form , where x is the explanatory variable and y is the dependent variable.The slop of the line is , is the intercept, and is an error term ( see [11]).
In many applications in different fields, we need to use one of a collection of models for correlated data structures, for example, multivariate observations clustered data, repeated measurements, longitudinal data and spatially data.Often random effects are used to describe the correlation structure in this type of this data.Mixed models are an extension of regression models that allow for the incorporation of random effects.However, they also turn out to be closely related to smoothing ( see [13]).
In this paper we study semiparametric regression models with spline smoothing, and we present the definition, properties of the statistical models, estimation method.Also we present the number of knots and their locations and the simulation models has been performed.The sample size taken are n = 20, 100 and 250.

Nonparametric regression
Given data of the form . Let the model ( see [5]): (1) Where the noise term satisfies the usual conditions assumed for simple linear regression, we seek an estimate of the regression function satisfying the model (1).There are several approaches to this problem, we will describe methods involving splines.

Splines
The discovery that piecewise polynomials or splines could be used in place of polynomials occurred in the early twentieth century.Splines have since become one of the most popular ways of approximating nonlinear functions.Splines are essentially defined as piecewise.Let k be any real number, then can define a p th degree truncated power function as ( see [2,3,4,7,8]): -- ) As a function of x, this function takes on the value 0 to the left of k, and it takes on the value to the right of k.The number is called a knot.The above truncated power function is a basic example of a spline.It is a member of the set of basis functions for the space of splines.
Let us consider a general p th degree spline with a single knot at k. Let denote an arbitrary p th degree polynomial.

Then :
- ) takes on the value P(x) for any , and it takes on the value for any Thus, restricted to each region, the function is a p th degree polynomial.As a whole, this function is a p th degree piecewise polynomial; there are two pieces.
Note that require coefficients to specify this piecewise polynomial.This is a result of the addition of the truncated power function specified by the knot at k.In general, we may add truncated power function specified at , each multiplied by different coefficients.Thus would result in p+K+1 degree of freedom.
An important property of splines is their smoothness.Polynomials are very smooth, possessing all derivatives everywhere.Splines possess all derivatives only at points which are not knots.The number of derivatives at a knot depends on the degree of the spline, consider the spline by (3), we can show that is continuous at k, when by noting that: S(k) = P(k) and so that can argue similarly for the first derivatives , j = 1, 2, … , p-1 And ---

So that
The p th derivative behaves differently : and So usually there is a discontinuity in the p th derivative.Thus p th degree splines are usually said to have no more than continuous derivatives.
The discussion below (3) indicates that can represent any piecewise polynomials of degree p in the following way: Any piecewise polynomial can be expressed as a linear combination of truncated power functions and polynomial of degree p In the other words, ---Is a basis for the space of p th degree splines possessing knots at . By adding a noise term to (4), we can obtain a splines regression model relating a response (5) to the predictor .

[ ] [ ]
The minimization problem can be written as: It can be shown, using a Lagrange multiplier argument, that this is equivalent to choosing to minimize : ‖ ‖ (6) for some .This has the solution The term is called a roughness penalty because it penalizes fits that are too rough, thus yielding a smoother result.The amount of smoothing is controlled by ( the smoothing parameter ).When the value of the smoothing parameter ( ) is very large then leads to the estimator is polynomials of degree only, while if the then leads to no exist roughness penalty.

Number and position of knots
If the number of knots too small, then the bias can be large in estimator, and if the number too large it is, preferred, we can use all the observations as knots.
Literature proposes several a approaches to automatic knot selection.Many of them are based on stepwise regression ideas.Although most of the automatic knot selection procedures proposed exhibit good performance they are each quite complicated and computationally intensive.In penalized spline the number of knots ( ) that usually works well is : ( ) , (see [8,12,15]): As the position of knots determine from the ( ) sample quantile of the unique for .

Cross Validation (CV)
Let ̂ denote the regression estimate at a point with smoothing parameter .One of the most common measures for the goodness of fit of a regression curve to a scatter plot is the residual sum of squares (RSS): . However, since is minimized at the interpollant ̂ , minimization of this criterion will lead to the smooth that is closest to interpolation.For penalized spline this corresponds to a zero smoothing parameter.Cross validation gets around this problem.The cross validation criterion is ( see [5,12]): Where ̂ denotes the regression estimator applied to the data but with deleted.The choice of ̂ is the one that minimizes over .

Generalized Cross Validation ( )
Efficient algorithms for computation of were developed in the mid1980s.Before that time, the difficulties surrounding computation of the cross-validation criterion led to the proposal of a simplified version.This simplified criterion is known a generalized cross-validation.

4-Mixed Models
Mixed models are an extension of regression models that allow for the incorporation of random effects.A more contemporary application of mixed models is the analysis of longitudinal data, clustered data repeated measurements and spatially correlated data.The general form of a linear mixed model is given as follows ( see [12]): where the vector has length , and are, respectively, a design matrix and a design matrix of fixed and random effects. is a -vector of fixed effects and are the -vectors of random effects.The variance matrix is a matrix and is a matrix.
We assume that the random effects and the set of error terms are independent.In matrix notation, (12) Here has length ∑ , is a design matrix of fixed effects, Z is a block diagonal design matrix of random effects, ∑ , is a q-vector of random effects, is a matrix and is a block diagonal matrix.
We now that treat estimation of , prediction of , and estimation of the parameters in and , one way to drive an estimate of is to rewrite (12) as: This is just a linear model with correlated error, since: For given , the estimator of is: ̃ (13) And is sometime referred to as generalized linear squares (GLS).For having a general distribution (13) can be shown to be the best linear unbiased estimator (BLUE) for .Alternatively, if is multivariate normal, then the right hand said of ( 13) is both the maximum likelihood estimator (MLE) and the uniformly minimum variance unbiased estimator (UMVUE).The latter is the estimator that has the best (smallest) possible variance of any unbiased estimator regardless of the parameters values [12].The random effects vector can be predicted via best linear prediction.
̃ ̃ (14) Then the BLUP of can also be written as:- Where [ ] and * + The fitted values are then: ̃ ̃ (16) where called Hat matrix or smoother matrix, The Loglikelihood of under the model is:-| | (17) By substitution (13) in ( 17) we obtain the profile loglikelihood for V:

4-1 Penalized spline as BLUPs
The penalized spline fitting criterion (6) ,when divided by can then be written as ( see [12]): Notice that this can be made to equal the BLUP criterion by treading the u as a set of random coefficients with: , where Putting all of this together yields the mixed model representation of the regression spline and note that the fitted values ̃ can be rewritten as:

5-Semiparametric Models
Let the model:- ¸ is penalty matrix.For a given smoothing parameter matrix , the penalized least squares estimators from (25) are : and the fitted values are ̂ ̂ ̂ , where is the smoothing matrix given by Where is smoothing matrix.

Numerical results:
The settings for the simulation study are as follows.The observations for the design variable are generated from uniform distribution on the interval [-1,1], for various sample sizes.These values are kept fixed for all settings to reduce simulation variability.The sample sizes taken are n = 20, 100 and 250.We have used the following two semiparametric regression functions, which represent a variety of shapes, , .For the error distribution we used normal distribution N(0, ), where , and .We have tried with different choices of as well.The penalty parameter is chosen by minimizing the generalized cross validation ( ) criterion, results shown in table (1).
To give an impression on the variability of the obtained estimators, we plot in figure 1-6 a scatter plot of the randomly generated data sets together with the fitted values from the penalized LS. regression spline estimation method.The goodness of fit of the estimated models quantified by computing the following criterions:

4) Average Mean Absolute Error (
) is defined by: ∑ Where is a number of frequencies Table (1) presents summary values of the ( ) and ( ) for the estimation methods.In all cases when the sample sizes (250), the values of ( ) and ( ) is smaller than that of the other two size (20) and (100) so, we can see from the figures 1-6 the results is better with sample size (n=250) for two test functions from table (1)

Conclusion:
From the result for the test functions we can see With increase the value of sigma of the normal distribution getting decreases degree of freedom and increases in value of smoothing parameter.
leads to a wiggly fit.For judicious choice of , a constraint of the type: ∑ If we define the matrix.

Figure 1
Figure 1 fitted curves from penalized regression spline estimation of first test function with design variable X distributed uniform distribution [-1,1]and the error distributed normal distribution , and sample size n=20,100 and 250.

Figure 2 Figure 3
Figure 2 fitted curves from penalized regression spline estimation of first test function with design variable X distributed uniform distribution [-1,1]and the error distributed normal distribution , and sample size n=20,100 and 250.

Figure 4
Figure 4 fitted curves from penalized regression spline estimation of second test function with design variable X distributed uniform distribution [-1,1] and the error distributed normal

Figure 5 Figure 6
Figure 5 fitted curves from penalized regression spline estimation of second test function with

Table ( 1
) result of the simulation study