* improve `CUDA_TARGET_CPU_ARCH` cache initialization,
allow to override initial value from calling script;
* add `CUDA_TARGET_OS_VARIANT` option to select OS variant;
* add `CUDA_TARGET_TRIPLET` option to select target triplet from
`${CUDA_TOOLKIT_ROOT_DIR}/targets` folder;
* remove `CUDA_TOOLKIT_TARGET_DIR` option, now it is calculated from
`CUDA_TARGET_TRIPLET`, the old approach still can be used for compatibility;
* for CUDA 6.5 and newer try to locate static libraries too, because
in 6.5 toolkit for ARM cross compilation only static libraries are included.