No, there are no CUDA atomic intrinsics for unsigned short
and unsigned char
data types, or any data type smaller than 32 bits.
However, you could group together two shorts or four chars and perform a 32-bit atomic on them, processing multiple at once (assuming your computation permits this).