ESTIMATION OF MONTHLY MEAN REFERENCE EVAPOTRANSPIRATION USING GENE EXPRESSION PROGRAMMING

Evapotranspiration is a main component of the water cycle and is important in crop growth. Monthly mean reference evapotranspiration (ET o ) is estimated using gene expression programming (GEP) in Basrah City, south of Iraq. Various climatic data, such as air temperature, relative humidity, and wind speed are used as inputs of GEP model to estimate the values of reference evapotranspiration (ET o ) given by the FAO-56 (Penman-Monteith equation). Nine input combinations tested with GEP are coded as model No. (1-9). Root relative squared error (RRSE) is taken as fitness function in each of GEP models. GEP models with three climatic input variables (temperature, relative humidity, and wind speed) take the highest level in the performance. The GEP technique was successfully employed to estimate ET o in the study area. The explicit formulas obtained can be used as powerful models for estimating the mean monthly ET o in the irrigation practices with limited climatic data.


INTRODUCTION
Evapotranspiration (ET) is the combination of soil evaporation and transpiration. It is defined as the sum of the volume of water used per unit area by the vegetative growth in transpiration and that evaporation from the soil, snow, or intercepted precipitation on a given area in any specified time (Al-Barrak, 1964). Evaporation and transpiration occur simultaneously, and there is no way of distinguishing between the two processes (Allen et al., 1998). The evaporation power of the atmosphere is expressed by the reference crop evapotranspiration (ET o ). ET o represents the evapotranspiration from a referenced vegetated surface; a large uniform grass field is considered as the reference surface. The concept of the reference evapotraspiration was introduced to study the evaporative demand of the atmosphere independently of crop type, crop development and management practices (Allen et al., 1998).
The FAO Penman-Monteith (PM) method is recommended as the sole method for estimating ET o . This method is closely represented for grass ET o because there is a relationship between transpiration of many plants and that from grass (Kisi et al., 2013). ET is an important part of the water cycle (Jason A. et al, 2010). ET is a complex phenomenon as it depends on several climatological factors, such as temperature, relative humidity, wind speed, radiation, and type and growth stages of the crop. ET can be directly measured by using lysimeter, but this method is time consuming and requires planning of accurate (Khoshhal, and Mokarram, 2012). Thus, indirect methods based on climatological data are suitable for ET o estimation (Kumar et al., 2002).
In the recent years, the artificial neural network (ANN) approach has been used to model reference evapotranspiration (Sudheer et al., 2003;Kumar et al., 2002;Kisi, 2006aKisi, , 2006bKisi, , 2007Kumer et al., 2011;and Khoshhal, and Mokarram, 2012). Adaptive neuro-fuzzy computing technigue is applied for estimating pan evaporation and evapotranspiration (Kisi and Ozturk, 2007). Dogan (2009)  Inputs required for PM computations include several climate variables, such as air temperature, wind speed, relative humidity, and solar radiation that are not always available or reliable. Therefore, the use of GEP with reasonable results and with different weather input data it is a good idea. These models when calibrated for a specific area is a useful and benefit tool for estimating ET o with limited weather data and stay away from lengthy accounts of PM method. Therefore, the main objective of this research is using GEP models to estimate the monthly mean ET o given by the FAO-56 PM equation in Basrah City, south of Iraq. Various input combinations of weather data are investigated using GEP models; the best combinations of this data is selected according to the comparison criteria, which are, root mean squared error (RMSE), mean absolute error (MAE), and coefficient of correlation (R).

CASE STUDY
Basrah City is located at Shatt Al-Arab River in southern Iraq. It is located between longitude lines ( Since ancient times, Basrah has been an agricultural area where palm trees, fruit, and vegetables are planted. Basrah is also known for planting tomatoes in Safwan-Al Zubair area (south west of center city) in winter season, which supplies the tomatoes demands of other Iraqi Provinces. The climate information used in this study is obtained from the meteorological recording station in Basrah City for the period . Conservation of existing water supplies is the first importance in the water management. To achieve this need, more information about evapotranspiration and irrigation requirements for satisfactory crop production is necessary (Al- Barrak, 1964). For purposes of timely and efficient water application, agricultural managers have long relied on evapotranspiration measurements or estimations. Therefore, an accurate assessment of ET is perquisite to improve water management practices (Roula Bachour, 2013).

CALCULATION OF REFERENCE EVAPOTRANSPIRATION ET O
The FAO Penman-Monteith (PM) method to estimate ET o can be drived as (Allen et al., 1998). (1) The actual vapour pressure (e a ) can be calculated from the relative humidity (RH) The extraterrestrial radiation, R a , for each day of the year and for different latitudes can be estimated from the solar constant, the solar declination and the time of the year by: Solar radiation (R s ) is given by following formula Where: Clear-sky solar radiation (R so ) is given by: Where: Z: Station elevation above sea level [m].
The net shortwave radiation (R ns ) resulting from the balance between incoming and reflected solarradiation is given by: Where: The net radiation (R n ) is the difference between the incoming net shortwave radiation (R ns ) and the outgoing net longwave radiation (R nl ) as followed: Finally, soil heat flux density [MJ m -2 day -1 ] (G) for monthly period: ℎ, = 0.14( ℎ, − ℎ, −1 ) Where:

GENE-EXPRESSION PROGRAMMING
Gene-expression programming (GEP) is a new evolutionary artificial intelligence technique developed by (Ferreira, 2001). GEP is the natural development of genetic algorithms (GAs) and genetic programming (GP). GEP uses the same kind of diagram representation of GP, but entities evolved by GEP are the expression of a genome (Ferreira, 2001). The genome or chromosome consists of a linear symbolic string of fixed length composed of one or more genes. The basic difference between these algorithms is the nature of the individuals. In GAs the individuals are symbolic string of fixed length (chromosomes), while in GP the individuals are non-linear entities of different size and shapes (parse trees), but in GEP the individuals are encoded as symbolic strings of fixed length (chromosomes) which are then expressed as non-linear entities of different size and shapes (expression trees). GEP genes are composed of a head and a tail. The head contains symbols that represent both functions and terminals, whereas the tail contains only terminals. The major types of terminal sets contain the independent variables of the problem. Table (1) shows terminal sets in the present research. After generation the initial population, each individual is then expressed, and its fitness is evaluated using one of the fitness function equations. According to fitness and the luck of the roulette, individuals are selected to be replicated. After selection process, the individuals are reproduced with some modifications performed by the genetic operators.
Genetic operators, such as mutation, transposition and insertion sequence elements, and recombination are used for these modifications. The new individuals are then subjected to the same process of modification, with advances the search process, starting from the initial population toward the final population containing the desired optimal solution (Ferreira, 2001a, b).

METHODOLOGY
In the present study, the data of the monthly mean air temperature (MEAN AIR TEMP), maximum air temperature (MAX AIR TEMP), minimum air temperature (MIN AIR TEMP), relative humidity (RELATIVE HUMIDITY), and wind speed (WIND SPEED) at 2 m above the ground surface in the Basrah City, south of Iraq, are used as inputs to GEP model to estimate the values of reference evapotranspiration (ET o ) given by the FAO-56 PM equation. DTREG (Predictive Modeling Software) (Phillip H. Sherrod, 2003) is used in the present research. Nine input combinations tested in the study area with GEP are coded as Model No.
(1-9) (see Table (1)). In the present research, the root relative squared error (RRSE) (Eq. 15) is taken as fitness function. This is based on the square root of the residual variance of the fitted model divided by the initial variance. Initial variance (Eq. 16) is the variance (Eq. 17) for the training data set using the mean value of the target variable as the predicted value for all rows. All fitness functions compute fitness scores that range from 0.0 to 1.0. A fitness of 0.0 means the model fits very poorly; it is worthless or not viable. A fitness score of 1.0 means the model fits the data perfectly. The first step in the GEP model building process is to create an initial population with a random set of functions and terminals. The population size in the present models is equal to 50; this is the number of chromosomes in the population being evolved. Usually a population size in the range of 30 to 80 chromosomes works well (Phillip H. Sherrod, 2003). Model building parameters used in this study were given in Table ( Where: Y and Ŷ : The observed and estimated values respectively, n : The number of observations, Y ̅ and Y ̅ : Mean of observed and estimated values respectively.

RESULTS AND DISCUSSION
Nine input combinations tested in the study area with GEP are coded as models from No. (1) to No. (9). The formula and statistical performance of each GEP model is shown in Table (3). The input of the GEP model is based on available weather data in the study area; the performance of these models is evaluated by computing root mean squared error (RMSE), mean absolute error (MAE), and coefficient of correlation (R). Different input sets may affect significantly the performance of model, for example, model No.2 has two inputs value, maximum and minimum air temperature; it underestimates the reference evapotranspiration given by the FAO-56 PM equation as shown in Fig. 2. Also, model No.5 has two weather inputs value, which are minimum air temperature and wind speed. The presence of wind speed with minimum air temperature increased the performance of this model as shown in Fig. 2 The diversity of input variables increases the accuracy of predicted ET o . The model with input variables (relative humidity and wind speed) in addition to temperature variable leads to a significant accuracy than model with only temperature variables as models (1, 3, and 6). Based on above results, two climatic variables are not enough for getting a very good estimation of ET o . When increasing the number of input sets, such as wind speed, relative humidity, maximum and minimum air temperature in the estimation procedure, the GEP models' performance increases significantly (see Fig. 3). Obviously, as shown in the statistical performance of Table (

CONCLUSION
In this study the performance of the gene expression programming technique to estimate reference evapotranspiration as a function of climatic variables in Basrah City is presented. Nine input combinations tested with GEP are coded as Model No. 1-9. The PM reference evapotranspiration was calculated based on 264 months of weather data from . Climatic variables in the input sets may affect the accuracy of GEP models. The diversity of input variables increases the accuracy of the ET o representation. The model with input variables (relative humidity and wind speed) leads to a significant accuracy than model with only input temperature variables. The GEP models with three climatic input variables (temperature, relative humidity, and wind speed) such as model No. 4 and model No.9 take the highest level in the performance with R equal to 0.9953 and 0.9966, respectively. The GEP technique was successfully employed to estimate ET o in the study area. The explicit formula obtained can be used as convenient and powerful model for estimating ET o in the irrigation practices with limited climatic data.