When dealing with 4x subsampling ratios (log2 == 2), such as can arise
with 4:1:1 or 4:1:0, a value range of 512 is not enough to cover the
range of possible scenarios.
For example, bottom-sited chroma in 4:1:0 would require an offset of 768
(three luma rows). Simply double the limit to 1024. I don't see any
place in initFilter() that would experience overflow as a result of this
change, especially since get_local_pos() right-shifts it by the
subsampling ratio again.