@princemio I found that you loose two frames with the ofxFastFbo reader (and you can set the number of buffers) which is generally common to loose in a projector or camera, so the latency wasn't that bad. This means if it's frame 1000, your readback of the fbo would give you frame 998...
In my case, I was matching data between cpu and gpu, so I basically stored the last n frames of data, and matched it up with the readback results. my problem was that I had to render alot of things in the fbo and I was kind of cpu constrained, so I had much worse hits if I rendered in ram (using cairo, opencv or something I rolled myself), etc and it helps me divide some of the work between the cpu and gpu and worked out pretty well.