Oh yes, caching is a very useful tool. I sprinkle it everywhere.
I'm sure I don't need to give any tips to Grant or Reid but for any novice programmers reading there are some things to be wary of.
Testing for a cache hit or miss has a cost. Modern CPUs use pipelining and branch prediction. Taking an unpredicted branch is expensive as the speculative execution in the pipeline is thrown away. So sometimes the cost of caching is actually higher than the savings. It's only worth doing to avoid very expensive computation. Doing a branch to avoid one or two floating point multiplications is counter-productive.
It's an ever changing environment and different users have different CPUs and profiling only tells you about your own particular CPU but as a rough rule of thumb I tend to go by 1 unit of time for simple instructions like add, 3 units of time for multiplication, 5 for division and about 50 for things like cos. Predicted branches cost about 3, "wrong" branches about 10. Function calls cost about 20.
I'd value any other people's rules of thumb because I am not confident at all about these numbers. They are very ballparky.
Although I think the future trend in CPU design is to attack the cost of "wrong" branches by devoting more speculative execution in that direction so anything that works today could be useless in 5 or 10 years time.
The next issue is chosing an invalid cache identifier for initialization. The standard if data is known to always be >= 0 is to use -1 but one fairly common type of bug is when you do a mod that introduces negative numbers while forgetting (or never even knowing about) -1 being considered an out of range value by some bit of caching code buried in the depths.
If the data has full scope then how do we pick an invalid identifier? Well it's very very naughty but if you have a 64 bit number then you can pick something at random and use that. Don't tell anyone I said that though. As 2 to the power 64 is roughly 10 to the 19th the odds of a random signal hit is about 10,000,000,000,000,000,000 : 1 providing you pick something that's really really random. Even with a CPU counting up at 100 GHz it would take about 300 years for it to visit every number. So don't use this for mission critical coding but it's a dirty workaround that works in the real world.
The next issue is flushing the cache. Sometimes you need to calculate regardless. I use the method name defeatOptimization in my classes so I always know what to call. If you are paranoid you can even call defeatOptimization once a second "just in case". A bit like NASA ban while loops in favour of a for loop that exits after some ridiculously large number of iterations. This is why satellites and rovers sometimes mysteriously come back to life They wake up when they eventually exit a for loop that some parnoid old programmer insisted be used to cover the risk of code going astray.
There was probably one or two other things I meant to say about caching but that's enough of me taking air from doing actual work......
Whoops, I think I got carried away with the zero's. There aren't many CPUs that can count at 100 GHz. I meant to say it would take about 300 years if counting at a rate of 1 GHz.