4. PERFORMANCE ANALYSISMethod of analysis :Crop yield prediction can help agricultural departments to have strategies for improving agriculture. Crop production depends on climatic, geographical, biological, political and economic factors. Because of these factors there are some risks, which can be quantified when applied appropriate mathematical or statistical methodologies. Actually accurate information about the nature of historical yield of crop is important modeling input, which are helpful to farmers & Government organization for decision making process in establishing proper policies. In this paper we have intend to propose a method for crop yield prediction using classifier.

The proposed crop yield prediction consists of three phases namely, preprocessing, feature reduction and prediction. Here the proposed method use input data as real world data. Real world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data pre-processing is a proven method of resolving such issues. A good data preprocessing helps to create better model and will consume less time.

Next phase of the proposed method is feature reduction; here we use multilinear principal component analysis (MPCA) for feature reduction phase. Finally the proposed method is use to predict the crop yield by means of regression. The performance of the proposed method is evaluated by prediction accuracy and error value.4.1.Regression:Regression analysis is a form of predictive modelling technique which investigates the association between a dependent (targets) and autonomous variable (s) (independent variables).Linear regression:Linear regression is a linear methodology for demonstrating the link between a scalar dependent variable y and one or more independent variables denoted X. The instance of solitary independent variable is called simple linear regression.Non linear regression:Nonlinear regression is a form of regression breakdown in which observational data are displayed by a function which is a nonlinear amalgamation of the model parameters and depends on one or more independent variables. The data is plotted by a technique of successive approximations. Figure.4.1.1. Linear Regression & Non-Linear RegressionMulti-linear regression:The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable. In this project, Multiple Linear Regression algorithm is used to predict the crops. Multiple Regression is an extension of simple Linear Regression. It is used when we want to predict the value of a variable based on the value of two or moreother variables. The variable we want to predict is called the dependent variable sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables). For example, Multiple Regression to understand whether exam performance can be predicted based on revision time, test anxiety, lecture attendance and gender. Multiple Regression also allows you to determine the overall fit (variance) of the model and the relative contribution of each of the predictors to the total variance.Formulae:A Linear Regression model that contains more than one predictor variable is called a Multiple Linear Regression model. The following model is A Multiple Linear Regression model with two predictor variables, ?1 and ?2 .? = ?0 + ?1?1 + ?2?2+ €€Where,?0, ?1, ?2 are coefficients of Multiple Linear Regression?1, ?2 … are independent variables.The model is linear because it is linear in the parameters ?0, ?1 and ?2 . The model describes a plane in the three dimensional space of ?, ?1 ? ?2 . The parameter ?0 is the intercept of this plane.Parameters ?1 and ?2 are referred to as partial regression coefficients. Parameter ?1represents the change in the mean response corresponding to a unit change in ?1 when ?2 is held constant. Parameter ?2 represents the change in the mean response corresponding to aunit change in ?2 when ?1 is held constant.Set theory:S = {I, Fm, O, S, F}I = {I1} . set of Input.I1 = Location of userFm = { GetLocation(), GetAttributes(latitude, longitude),GetSoil(), GetWeather(),FeasibleCrop(soil,weather), PastProduction(),ProfitableCrop(FeasibleCrops, PastProduction)MaxProfitableCrops() } Set of functions.Where, soil ” N, P, K componentsweather ” Temperature and Rainfall valuesO = {Crop predicted for given Location} .. Set of output.S= Correct prediction for High production and profit . Success ConditionF = Failure in prediction due to incorrect training data …Failure Condition4.3.Mathematical representation of algorithm:?1 = ?0 + ?1?2 +. + ? ? + €€? for i=1,2, , nWhere, ?0, ?1, ?2are coefficients of Multiple Linear Regression?1, ?2 . . ? are independent variables.X {weather attributes, soil attributes}Y{production}? = ? + ?Y = XІ + EY- production matrix X- attributes matrix B- Partial coefficient matrix E- error control? = (X’X)-1 X’Y Least Square EstimateX’ – Transpose X-1 – Inverse of MatrixPrediction: ? = X?Result: res= ? €’ ?Multivariate linear regression:Linear regression is one of the most used techniques for predicting a series. This linear regression comes under supervised learning where the data is arranged continuously pattern. We choose linear regression as our first step for the model as our data is unlabeled. The features of our crop yield prediction include average temperature, precipitation, diurnal temperature range, potential evapo transpiration concerning year. All this data is arranged in heat map where the index is the month, and the column is the year. Initially, the model we took for this prediction is a simple line equation mathematically, y=mx+c.Regression Line: Input for prediction: Output of prediction: The process for prediction first includes cleaning of the given dataset. First, clean the data by removing/replacing all ‘NaN’/missing values. To apply a linear model to a dataset the data set is split into two more data frames Training dataset and testing dataset. The training dataset contains all the input features given above, and the test data set is divided to find the accuracy of the model. We use other metrics like Standard Deviation, Mean square error for detecting the accuracy of the model. After visualising the data, we see the correlation between few attributes like average temperature, and we found that linear regression can be applied to prediction model rather than any other techniques. The inputs we took our data of past 100 years with all these attributes. We used the above procedure and divided the data into train and test sub-datasets and then fit into the linear model. The accuracy we found was 10 percent, and the results were pretty decent when compared to other linear models. All this computation was made using International Journal of Pure and Applied Mathematics Special Issue 12518 sklearn a python library for the rainfall prediction. The prediction using the linear model are tabulated below Multiple Linear Regression is based on feature similarity; it depends on the reach of how approximately the training set determines to classify the given data at a point. The nearest neighbor parameter is defined as the number of training samples described in the closest new point for the prediction.It is the user-defined constant based on the local density of points.For this agricultural data, we trained our prediction model by changing the nth nearest neighbours parameters from 1 to 11 due to the number of months present in our data. We found the results to be varying for each nth neighbours. The accuracy of the test and train data are tabulated below. These increase continuously until 11 nearest neighbours and thenreached a saturation point.