Generating Illustrations for Children Books Using Generative Adversarial Networks
Özet
This thesis presents a very first method to create children illustration book images. Our method takes edge maps or sketches as input and generates images and/or videos stylized by the selected illustrator. To achieve this, we developed three novel models; an image colorization, an image stylization and a video translation model. Our proposed sketch to image translation model integrates an adversarial segmentation loss to baseline methods which improves the FID and mIoU scores greatly. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content.
Current video-to-video translation methods build on an image-to-image translation model, and integrate additional networks such as optical flow, or temporal predictors to capture temporal relations. These additional networks complicate and slow down the model training and inference. We propose a new method for ensuring temporal coherency in video-to-video style transfer. We propose a new generator network with feature warping layers to overcome the limitations of the previous methods.
Even though, current state-of-the-art image stylization models could generate highly artistic images, quantitative evaluation of them is still an open problem. We propose a new evaluation framework which considers both content and style transfer aspect of an image stylization model.
To train these three models, we collected a unique illustration dataset. Our dataset contains more than 10,000 illustrations from 26 different illustrators. It also contains object level bounding box annotations. In its current form it is the first large scale illustration image dataset. We conduct in depth analysis to show that the dataset is challenging.