Text this: UniD3: unified discrete diffusion for simultaneous vision-language generation

 _    _     _____     _  __    ______            
| || | ||  |  ___||  | |/ //  /_   _//     ___   
| || | ||  | ||__    | ' //    -| ||-     /   || 
| \\_/ ||  | ||__    | . \\    _| ||_    | [] || 
 \____//   |_____||  |_|\_\\  /_____//    \__ || 
  `---`    `-----`   `-` --`  `-----`      -|_|| 
                                            `-`