Scatter Plot

*Short Summary *

A scatter plot, or X-Y scatter diagram, reveals relationships or association between two variables. Such relationships manifest themselves by any non-random structure in the plot. Regression analysis, most commonly linear regression, can be performed on the scatter plot to determine if there is a functional relationship between the independent (X) variable and the dependent (Y) variable.

This Giovanni-3 operation takes the user-selected data variables and creates a scatter plot. Linear regression is an optional output for this scatter plot.

To make the scatter plot operation takes two one-dimensional arrays as input. It generates a scatter plot from these two data arrays, and also calculates and draws a linear fit for the scatter plot on the same display image.

*Long Summary*

A scatter plot, scatter diagram or scatter graph is a graph used in statistics to visually display and compare two or more sets of related quantitative, or numerical, data by displaying only finitely many points, each having a coordinate on a horizontal and a vertical axis. A scatter plot shows the position of all of the cases in an x-y or x-y-z coordinate system. Relationships between variables can be identified from scatter graph. A dot in the body of the chart represents the intersection of the data on the x and y axis.

(In probability theory and statistics, correlation, also called correlation coefficient, indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence, although correlation does not imply causality. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of data.

Simple linear regression and multiple linear regression are related statistical methods for modeling the relationship between two or more random variables using a linear equation. Simple linear regression refers to a regression on two variables while multiple regression refers to a regression on more than two variables. Linear regression assumes the best estimate of the response is a linear function of some parameters (though not necessarily linear on the predictors).

A linear equation is an equation involving only the sum of constants or products of constants and the first power of a variable. Such an equation is equivalent to equating a first-degree polynomial to zero. These equations are called "linear" because they represent straight lines in Cartesian coordinates. A common form of a linear equation in two variables is

y = ax + b

In this form, the value *a* will determine the slope or gradient of the line; and the value *b* will determine the point at which the line crosses the y-axis. Equations involving terms such as x^{2}, y^{1/3}, and xy are "non-linear".

If we have a series of *n* measurements of X and Y written as x_{i} and y_{i} where i = 1, 2, ..., *n*, then the correlation coefficient can be used to estimate the correlation of X and Y. It is especially important if X and Y are both normally distributed. The correlation coefficient is then the best estimate of the correlation of X and Y.

Linear fitting also computes the correlation coefficient (r) and the root mean square (RMS) error.

This Technical Summary page is under development -- for information on this operation while

the document is being prepared, please send a message to GES DISC Help.