DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

Abstract

We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable.

To address an assortment of challenges with our task at hand including conflicting goals (artistic stylization vs. legibility), lack of ground truth, and immense search space, our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model with a diffusion model backbone. Specifically, we employ the denoising generator in Latent Diffusion Model (LDM), with the key addition of a CNN-based discriminator to adapt the input style onto the input text. The discriminator uses rasterized images of a given letter/word font as real samples and output of the denoising generator as fake samples.

Our model is coined DS-Fusion for discriminated and stylized diffusion.

Pipeline

The pipeline of DS-Fusion, which takes as input a style prompt and a glyph image. The style images are generated according to the style word and attribute. DS-Fusion first utilizes a latent diffusion process to construct the latent space of the given style and then introduces a discriminator to blend the style into the glyph shape. The parameters of a module are pre-trained and frozen if there is an icon of a lock on the bottom right. The “+” module denotes the iterative noise injection process of diffusion models.

Single Letter Stylization

We show compelling single-letter artistic typography results generated by DS-Fusion. Each result is produced by composing generated single-letter results with renderings of the remaining letters in the input style word. To render the other letters, we use the same font as the stylized glyph and select a complementing color which is often the dominant color in the stylized glyph.

Multi Letter Stylization

Synthesizing artistic typographies from multiple glyphs is more challenging due to the added structural complexities in the inputs, as well as the more global context to account for. In more ways than one, DS-Fusion demonstrates its ability to utilize all the letters of a word to convey semantic features, in a creative manner.