Its a convention for the bell shaped curve you get if you measure many kinds of things. The distribution of values falls into a pattern which is remarkably similar for many different things. There is a clear central value. Squaring all the differences does get rid of the minus values as you say, and then you add them all up and take the square root as a standard measure of how much variance that is. Then you can use one, two, or three standard deviations to express in a standardised way the proportion of your population that is close to the mean. You will see that squaring the differences does accentuate the extreme low and high values as well as providing a mathematical track that enables many different populations to be compared. In other examples of root mean square you get values which have meaning in electrical power for example which are more useful than trying to do an average in calculation. But to calculate the average or mean you just use the plus and minus values without squaring them.

When we realize that patterns don't exist in the universe, they are a template that we hold to the universe to make sense of it, it all makes a lot more sense.

