PDA

View Full Version : Optimizing Code - Compound IF-ANDs

dgavin
2012-Jul-17, 01:35 PM
This is one of the easier Optimizations that may be done when dealing with often used code (large data sets or large loops), and sometimes the most over looked one, and work's on any platform. The trick to pulling it off well involves a little data or loop analysis though.

Take the following compound if

if (enddate >= selecteddatevalue
&& startdate <= selectedatevalue
&& activeflag == true
&& state.compareto(selectedstate) == 0)
{
...do something
}

May run much faster (see explaination after) if it's reorganized as follows

if (activeflag == true)
if (state.compareto(selectedstate) == 0)
if (startdate <= selectedatevalue)
if (enddate >= selectedatevalue)
{
...do something
}

The basic idea to if/when you should reorganize compound IF's is based on analyising the data, to determine which condition would eliminate the the most, (be true the least) and code that if first, the second most emlinating condition second, and so on.

In the case where I first learned this the difficult way by practical experience, I cut the run time of a process from near 20 minutes to 2, by just reorganizing a single IF with six conditions, into 6 cascading IF's where the first IF tended to be false most often. You typicaly do have to do a little data research to determine that, with SQL servers thats a simple matter, but if it's flat file data, that can be done by just temporalily adding countes for each part of the condition into your code one time to get results, to find the most often false condition, and then the next most often false, and so on.

The reason this works is a simple matter. In the compound if example, it's having to check all 4 data types, on every iteration of the data or loop (think of this of it as doing 4 compare instruction for every time no matter what) . In the reorganized example, the first if when false, prevents the other three if's from ever being exceuted at all. There for, less machine instructions are actualy done on the CPU. (It eliminates 3 compare instructions, when the first is false). Don't rely on CPU Optimizations or Compilers one in this case, as actualy reorganizing the IF when (or if) possible, procudes faster results then what those Optimizers can tend to accomplish.

NEOWatcher
2012-Jul-17, 01:58 PM
The basic idea to if/when you should reorganize compound IF's is based on analyising the data, to determine which condition would eliminate the the most, (be true the least) and code that if first, the second most emlinating condition second, and so on.
Yes; this can be very beneficial.

The reason this works is a simple matter. In the compound if example, it's having to check all 4 data types, on every iteration of the data or loop (think of this of it as doing 4 compare instruction for every time no matter what) .
In many languages this is not true. They will stop on the first false condition.
Although; nesting is sometimes more readable when you want to group or seperate concepts, and it is a more reliable way to do it.

There are also some logic issues to this as well
If I write "not (a or b)", the language will have to evaluate the entire line in all cases.
But; if I write "(not a) and (not b)" they will stop at the evaluation of "a" in some cases. (provided the language does stop)

Strange
2012-Jul-17, 02:15 PM
Correctly ordering the conditions is also important when there is the possibility of an error caused by certain values. For example,

if (p != NULL) && (*p > 0) { // pointer will only be dereferenced if it is not NULL
...
}

Or ...

if (p == NULL) || (*p == 0) { // pointer will only be dereferenced if it is not NULL
...
}

dgavin
2012-Jul-17, 02:19 PM
Yes; this can be very beneficial.

In many languages this is not true. They will stop on the first false condition.
Although; nesting is sometimes more readable when you want to group or seperate concepts, and it is a more reliable way to do it.

There are also some logic issues to this as well
If I write "not (a or b)", the language will have to evaluate the entire line in all cases.
But; if I write "(not a) and (not b)" they will stop at the evaluation of "a" in some cases. (provided the language does stop)

Even when languages stop after the first non true, you still get a benefit if you reorganize the AND's according to the most failing condition first, and not translate them to IF's. I haven't tested to see if the stop after first false action is faster or slower then doing a full reorganizing into IF's, so didn't want to comment on that. But yes, most complilers now adays sent to stop complex AND conditions when the first false is reached.

I totaly am in agreement with you about the readability and reliablility of fully reorganizing into nested IF's.

dgavin
2012-Jul-17, 02:23 PM
Correctly ordering the conditions is also important when there is the possibility of an error caused by certain values. For example,

if (p != NULL) && (*p > 0) { // pointer will only be dereferenced if it is not NULL
...
}

Or ...

if (p == NULL) || (*p == 0) { // pointer will only be dereferenced if it is not NULL
...
}

Nice point, I tend to code complex applications for maintainability by others, so I tend to try an avoid pointers, but thats definately another good reason for ordering conditions.

Strange
2012-Jul-17, 04:22 PM
Nice point, I tend to code complex applications for maintainability by others, so I tend to try an avoid pointers, but thats definately another good reason for ordering conditions.

It doesn't just apply to pointers; it could be that the second condition would divide by zero or similar.

swampyankee
2012-Jul-17, 04:56 PM
Code optimization is usually a waste of effort; algorithm optimization is where one gets the real gains. The best implementation for bubble sort is never going to be as good as even a mediocre one for Shell or quick sort when more than a trivial amount of data (a few dozen items, say) are involved. In other words, O(n^2) will quickly get beaten by O(n log n).

Incidentally, decent Fortran compilers have been able to beat the best hand-coded assembly language for about 40 years. I think they've only recently be bested by C or C++, and that's mostly because Fortran has gone out of fashion.

dgavin
2012-Jul-17, 05:50 PM
Code optimization is usually a waste of effort; algorithm optimization is where one gets the real gains. The best implementation for bubble sort is never going to be as good as even a mediocre one for Shell or quick sort when more than a trivial amount of data (a few dozen items, say) are involved. In other words, O(n^2) will quickly get beaten by O(n log n).

Incidentally, decent Fortran compilers have been able to beat the best hand-coded assembly language for about 40 years. I think they've only recently be bested by C or C++, and that's mostly because Fortran has gone out of fashion.

Since when was I discussing either SyncSort, or SQL search algorithms? I was plannign to cover that in different topic , this one is for dealing with complex if optimizations.

selden
2012-Jul-17, 05:59 PM
To paraphrase swampyankee's comment: whatever compilers or interpreters you folks are using are seriously deficient. Their lack of optimizations need to be reported to whoever is supposed to be maintaining them. They certainly aren't production quality.

On the other hand, if you're trying to educate one another on the types of optimizations modern compilers implement... ignore my interjection.

NEOWatcher
2012-Jul-17, 06:15 PM
On the other hand, if you're trying to educate one another on the types of optimizations modern compilers implement... ignore my interjection.
A lot of times that is true, but in the case of the OP's example, the optimizer is not going to help since it's data dependent.

It's also going to depend on the application. I've had nearly 30 years in the industry, and little tricks like this really don't make a lot of difference. The last time I ran into these kinds of issues, it's been related to CAD and a lot of number crunching.

Sometimes you have to give up efficiency for readability, modifiability, and flexibility. All it takes is one user to request a new feature or slight change and you can be sunk.

dgavin
2012-Jul-17, 06:55 PM
A lot of times that is true, but in the case of the OP's example, the optimizer is not going to help since it's data dependent.

It's also going to depend on the application. I've had nearly 30 years in the industry, and little tricks like this really don't make a lot of difference. The last time I ran into these kinds of issues, it's been related to CAD and a lot of number crunching.

Sometimes you have to give up efficiency for readability, modifiability, and flexibility. All it takes is one user to request a new feature or slight change and you can be sunk.

Very true, balanicing code towards maintainablity is usualy the best guide line. However sometime optimizing is required; I had one project a few years ago that was dealing with a 97 billion record/row data set as input into a data coversion routine. In that case we had to throw about every kind of optimization at that code we possibly could, including using multi threading... that was a bear to maintain when we finished with it, but it was a one time use program.

cjameshuff
2012-Aug-17, 08:42 PM
Focusing on the construction of conditionals is being rather overly specific. The basic principle to apply here is to look for ways to quickly exclude entries/samples/etc from consideration, and attempt those first to minimize the number of times you have to do more expensive tests. This may involve simply splitting a complex test into simpler sub-tests and dropping out early, or it can involve creating an approximate test that gives some false positives, but allows you to quickly reject "misses".

For example, say you have a set of strings stored by a character array and a length value. You need to do an equality test. A very simple optimization is to check that the lengths are equal first...strings with unequal lengths can't be equal. Equal lengths aren't sufficient to determine equality, but are necessary, and it's a very quick test. (This is not so useful when using C null-terminated strings, as you have to check each character of the string anyway to find the terminating null...)

If you want to do a large number of equality tests on a data set, it can similarly be worthwhile to precalculate a hash value for each entry. Comparing hash values can be a matter of a single integer comparison, which (given a large enough hash) will be sufficient to reject the vast majority of unequal pairs. Computing a hash value is more expensive than doing a comparison, but it only has to be done once per entry...the resulting hash value can be reused until the entry is modified.

But as swampyankee said, it's the algorithms and data structures where the real possibilities for optimization are. A linear search is going to have to visit half the data set on average every time you search for something in the set, and will have to visit the entire data set every time you search for an item not in the set. If you sort the data set, you can check whether a given item is in the range between the lowest and highest ordered items with just two comparisons, and can do a binary search that lets you reject half the remaining data set at each step. Tracking down some sub-optimal conditionals might halve your search time, smarter data structures might take your average number of comparisons per search from 16 million to a couple dozen.

dgavin
2012-Aug-18, 04:38 PM
cjameshuff,

I'm not disagreeing with what you an swampyyakee are saying, at all, but i'm trying to keep these guides, focused on one aspect of optimization at a time. In this case structuring if's according to the most likely false condistion, first. Aka, Knowing your data, which granted, sometimes isn't always possible to order If's this way whe the data it someone unpredicatle in nature.