Hi, IIRC the first block is wrote at the end thus in a 256 reblocked sub-patch you have: First vector: X X X A Then: B C D E That probably explains the latency to be (256 - 64). HTH.