Giant list of Word Cloud Techniques and research papers

Exploring Advanced Word Cloud Techniques and Visual Text Analytics: A Comprehensive Review of Research Papers

CrawlSpider team is working on an innovative Word Cloud generator and a chrome extension that will assist in keyword analysis and SEO Audit for on-page analysis. As part of the research we came across many tools and research papers that helped us create our unique algorithm. Following is a list of all papers and research material that we came across and provided a brief introduction to each.

List of All Word Cloud tools

This is a very comprehensive article that tries to cover our own due diligience for creating the best Word cloud generator. Below you will find the python libraries, javascript libraries and Free and paid tools to generate online word clouds or word cluster diagrams. Some of the tools might be defunct or out of service but you can still find them in archive.org website for reference purpose. If you are familiar with quadtree, collision detection and circle packing algorithm then you should be able to understand the logic for building a word cloud

In addition we have documented ton of research articles, so enjoy!

Here are some popular Python libraries for generating word clouds:

  1. WordCloud
    • One of the most popular libraries for generating word clouds in Python. It offers a variety of customization options, including word frequencies, stop words, font color, and shape.
    • Installation: pip install wordcloud
    • Documentation
  2. matplotlib
    • While primarily a plotting library, matplotlib can be used in conjunction with other libraries (like WordCloud) to display word clouds and adjust them in terms of size, color, and layout.
    • Installation: pip install matplotlib
    • Documentation
  3. Pillow (PIL Fork)
    • A Python Imaging Library (PIL) fork that can be used to manipulate images and text in combination with word cloud libraries for customizing word cloud shapes, colors, and backgrounds.
    • Installation: pip install pillow
    • Documentation
  4. Numpy
    • Although Numpy is not specifically a word cloud library, it can be useful when working with arrays and images, especially when manipulating masks and colors for generating complex word clouds.
    • Installation: pip install numpy
    • Documentation
  5. PyTagCloud
    • A simple library to generate customizable word clouds in SVG, PNG, or HTML. It allows for flexible customization of word appearance and layout.
    • Installation: pip install pytagcloud
    • Documentation
  6. Tagul
    • An online word cloud generator that also offers a Python API for generating word clouds. Tagul allows advanced customization and offers different word cloud shapes.
    • No direct installation via pip; API usage requires an account with Tagul.
  7. pytorch-wordcloud
    • Combines PyTorch’s deep learning capabilities with word cloud generation, especially useful if you want to work on complex NLP models and generate word clouds based on model results.
    • Installation: pip install pytorch-wordcloud
    • Documentation

These libraries and tools provide diverse functionalities and are useful in a variety of word cloud generation projects, depending on your specific needs.

Here are some popular JavaScript libraries for generating word clouds:

  1. wordcloud2.js
    • A lightweight and flexible word cloud library for generating word clouds in HTML5 canvas elements. It offers various customization options, including color, font, and word placement.
    • Installation: npm install wordcloud or use a CDN.
    • GitHub Repository
  2. d3-cloud
    • A word cloud layout generator for the popular D3.js visualization library. It supports SVG rendering and allows fine control over the layout, font, rotation, and word frequency.
    • Installation: npm install d3-cloud
    • GitHub Repository
  3. jQCloud
    • A jQuery plugin that generates simple word clouds. It provides basic customization options like font size, color, and word placement, and works well with small word sets.
    • Installation: npm install jqcloud2
    • GitHub Repository
  4. TagCanvas
    • A 3D interactive tag/word cloud library that uses HTML5 canvas. Words can rotate and move interactively, creating visually appealing word clouds.
    • Website & Documentation
  5. React-wordcloud
    • A word cloud component for React applications. It wraps around d3-cloud to make it easier to integrate word clouds in React apps with support for advanced customization.
    • Installation: npm install react-wordcloud
    • GitHub Repository
  6. Cloud 9
    • A flexible word cloud generator built on JavaScript and HTML5 canvas. It allows users to easily configure the appearance and layout of the word cloud.
  7. Vis.js Word Cloud
    • Part of the Vis.js suite for data visualization, it provides options for generating customizable word clouds and supports a variety of text visualization tasks.
    • Installation: npm install vis-network
    • GitHub Repository
  8. Fomantic-UI Word Cloud
    • A word cloud module based on Semantic-UI that allows for responsive, simple word clouds using CSS classes. It’s useful for those already using the Fomantic-UI or Semantic-UI framework.
    • Fomantic UI Documentation
  9. Highcharts Word Cloud
    • A module of the Highcharts library designed to generate word clouds. It is ideal for users familiar with Highcharts who want to integrate word clouds into interactive charts.
    • Installation: npm install highcharts
    • Highcharts Word Cloud Documentation

These libraries provide a range of options for generating word clouds in JavaScript, from lightweight and flexible solutions to more interactive or 3D-based clouds. Depending on your use case, whether it’s React integration or interactive 3D clouds, you can choose the one that best fits your needs.

Here is a list of both free and paid online services for generating word clouds:

Free Word Cloud Generators:

  1. WordArt (formerly Tagul)
    • Free tier available, with advanced customization options like shapes, fonts, colors, and word layouts. It allows users to save and share word clouds.
    • Paid plans unlock higher resolution images and additional customization.
    • Visit WordArt
  2. WordClouds.com
    • Free word cloud generator with various customization options including fonts, shapes, colors, and orientations. Users can save word clouds as images or PDFs.
    • Visit WordClouds
  3. Jason Davies Word Cloud
    • A free, open-source tool offering simple and customizable word clouds. It supports various orientations and sizes but lacks more complex visual customization features.
    • Visit Jason Davies Word Cloud
  4. TagCrowd
    • A free, simple word cloud generator that allows users to paste text, upload files, or enter URLs. It provides limited customization options.
    • Visit TagCrowd
  5. WordItOut
    • A free tool that creates word clouds from text or data and offers simple customization for font, color, and layout. It is easy to use but offers limited advanced features.
    • Visit WordItOut
  6. ABCya Word Cloud
    • A free, simple word cloud generator primarily aimed at educational use. It offers basic customization such as word size, color, and font.
    • Visit ABCya Word Cloud
  7. MonkeyLearn Word Cloud Generator
    • A free tool that allows you to create word clouds from text. You can paste text or upload files to generate your word cloud, and the tool provides basic options for customization.
    • Visit MonkeyLearn Word Cloud Generator

Paid Word Cloud Generators:

  1. ProWordCloud
    • Offers both free and paid versions. The paid version includes high-resolution word cloud exports and advanced customization. Great for businesses or users needing premium design and quality.
    • Visit ProWordCloud
  2. WordSift
    • Primarily a free tool but can be extended with premium features for advanced users. Designed for educators, it offers word cloud generation with word highlighting based on frequency and text context analysis.
    • Visit WordSift
  3. Canva Word Cloud Generator
    • Canva offers word cloud generation as part of its paid plan. It integrates with its powerful design tools for more advanced customizations. Canva is great for designing presentations, posters, and social media graphics.
    • Visit Canva
  4. Tagxedo
    • Free for basic use but offers paid plans for higher resolution downloads and commercial usage. Tagxedo allows for word clouds in various shapes, including custom shapes, and offers plenty of customization options.
    • Visit Tagxedo
  5. Wordle.net (Archived)
    • Wordle was a popular free tool for generating simple word clouds, but it has been archived and is no longer actively supported. However, some versions may still be available online. The algorithm that worldle.net used in the java program is well described by the author in this thread
  6. Vizzlo Word Cloud
    • A premium tool with a free version offering limited access. It provides business-grade word clouds for presentations and reports. The paid version unlocks higher-quality exports, custom fonts, and additional themes.
    • Visit Vizzlo
  7. Mentimeter Word Cloud

Freemium Word Cloud Generators:

(Free with basic features, paid for premium features)

  1. WordCloudMaker
    • A freemium service that offers free basic word cloud generation, but with a paid option for more advanced features such as custom shapes and fonts.
    • Visit WordCloudMaker
  2. WordCloud Generator by Zingsoft
    • Offers a free tier with basic word cloud creation and customization. Paid plans include enhanced features like higher resolution and greater customization for businesses or advanced users.
    • Visit WordCloud Generator by Zingsoft

These services offer various levels of functionality, ranging from simple and free tools to more advanced paid options, catering to both casual and professional needs.

List of All Word Cloud and Keyword related research articles

Taking Word Clouds Apart: An Empirical Investigation of the Design Space for Keyword Summaries
Authors: Cristian Felix, S. Franconeri, E. Bertini (2018)
This paper presents four user studies that explore the visual design space for keyword summaries. The authors created a design space of possible visual representations, comparing solutions based on tasks and performance metrics. They studied how different visual representations affect user performance in extracting information from keyword summaries and provided guidelines for designing effective keyword summaries. The study showed strong dependency on the tasks users performed, helping to clarify scattered existing literature on word cloud effectiveness.
Taking Word Clouds Apart: An Empirical Investigation of the Design Space for Keyword Summaries

Semantic Wordification of Document Collections
Authors: F. Paulovich, F. Toledo, G. P. Telles, R. Minghim, L. G. Nonato (2012)
This paper introduces ProjCloud, a novel approach for generating document clouds that preserve semantic relationships among words. The paper emphasizes the need to link word clouds to the underlying document sets they represent. ProjCloud uses multidimensional projection to maintain semantic consistency and visualize relationships between documents and their corresponding word clouds. Additionally, a new algorithm for building word clouds inside polygons ensures semantic coherence among words.
Semantic Wordification of Document Collections

ReCloud: Semantics-Based Word Cloud Visualization of User Reviews
Authors: Ji Wang, Jian Zhao, Sheng Guo, Chris North, Naren Ramakrishnan (2014)
ReCloud applies grammatical dependency parsing to create a semantic graph of user review content, arranged using a force-directed layout to generate clustered word clouds. This technique reveals semantic relationships in user reviews, enhancing user comprehension and improving performance in extracting insights from large review datasets, compared to randomized layout word clouds. The authors compared their method with other review reading techniques, showing that ReCloud offers significant improvements.
ReCloud: Semantics-Based Word Cloud Visualization of User Reviews

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents
Authors: Quim Castellà, Charles Sutton (2014)
This paper introduces “word storms,” a technique for visually comparing multiple documents using word clouds. A novel algorithm ensures that words appearing in multiple documents are placed in the same location across different clouds. This method facilitates easier visual comparison by maintaining consistent color, location, and orientation for words across documents. The evaluation showed that word storms significantly improve the ability to visually compare documents.
Word Storms: Multiples of Word Clouds for Visual Comparison of Documents

Word Cloud Explorer: Text Analytics Based on Word Clouds
Authors: Florian Heimerl, S. Lohmann, Simon Lange, Thomas Ertl (2014)
Word Cloud Explorer is a prototypical system developed for text analysis using word clouds as the primary visualization method. The system is equipped with natural language processing and interaction techniques that enhance word cloud functionality. Users can perform various text analysis tasks with advanced interaction capabilities, such as filtering and zooming into word clouds. The paper presents a qualitative user study evaluating its effectiveness for solving text analysis tasks.
Word Cloud Explorer: Text Analytics Based on Word Clouds

Context Preserving Dynamic Word Cloud Visualization
Authors: Weiwei Cui, Yingcai Wu, Shixia Liu, Furu Wei, Michelle X. Zhou, Huamin Qu (2010)
This paper introduces a method that combines a trend chart with word clouds to visualize the temporal evolution of document collections. The trend chart encodes the semantic evolution of document content over time, while dynamic word clouds depict keywords at different time points. The method uses geometry meshes and an adaptive force-directed layout to ensure semantic coherence and spatial stability across word clouds, helping users to track changes in document content over time.
Context Preserving Dynamic Word Cloud Visualization

Semantic Word Cloud Generation Based on Word Embeddings
Authors: Jin Xu, Y. Tao, Hai Lin (2016)
This paper proposes a new method for generating semantic word clouds based on word embeddings. The method captures word semantics and constructs a word similarity graph to arrange words compactly while preserving their semantic relationships. Users can interact with the word cloud to explore word meanings and contexts. The method is demonstrated on user-generated reviews across various fields, showing improved readability and aesthetics over traditional word cloud methods.
Semantic Word Cloud Generation Based on Word Embeddings

Semantically Structured Tag Clouds: An Empirical Evaluation of Clustered Presentation Approaches
Authors: Johann Schrammel, Michaela Leitner, M. Tscheligi (2009)
This paper evaluates the effectiveness of semantically structured tag clouds. The authors conducted a series of experiments to compare semantic, alphabetical, and random tag cloud layouts. Results indicate that semantically clustered tag clouds improve performance in specific search tasks and increase attention to smaller tags. Half of the participants preferred semantically structured tag clouds for general search tasks.
Semantically Structured Tag Clouds: An Empirical Evaluation of Clustered Presentation Approaches

Experimental Comparison of Semantic Word Clouds
Authors: L. Barth, S. Kobourov, S. Pupyrev (2014)
This paper compares six different algorithms for creating semantic word clouds, including three new ones developed by the authors. Two of the new algorithms outperform the others by placing related words close to each other, improving word adjacency without compromising on other metrics. The comparison uses two datasets: Wikipedia and research papers.
Experimental Comparison of Semantic Word Clouds

Semantic-Preserving Word Clouds by Seam Carving
Authors: Yingcai Wu, Thomas Provan, Furu Wei, Shixia Liu, K. Ma (2011)
This paper introduces a seam carving technique to create compact word clouds that maintain semantic relationships. The method optimizes word cloud layouts by removing low-energy regions based on a Gaussian-based energy function, preserving the overall semantic structure. The authors also developed interactive visualization techniques to facilitate visual text analysis and comparison. Case studies demonstrate the effectiveness of the approach.
Semantic-Preserving Word Clouds by Seam Carving

Visual Text Analytics
Authors: C. Collins, Antske Fokkens, A. Kerren, C. Weaver, Angelos Chatzimparmpas (2022)
This paper explores the use of visual analytics for text data, highlighting the complementary strengths of human analysis and machine learning. The report discusses the outcomes of Dagstuhl Seminar 22191, where interdisciplinary working groups examined key areas of visual text analytics, identifying gaps in knowledge and future research directions. The paper emphasizes the integration of visualization techniques with NLP and machine learning to enhance text analysis.
Visual Text Analytics

ContextWing: Pair-wise Visual Comparison for Evolving Sequential Patterns of Contexts in Social Media Data Streams
Authors: Yuheng Zhao, Xinyu Wang, Chen Guo, Min Lu, Siming Chen (2023)
ContextWing is an interactive system for comparing evolving sequential patterns between two data streams. The system generates dynamic topics and sequential patterns, visualizing pair-wise correlations between data streams. The “bilateral wing” metaphor helps users intuitively understand similarities and differences in both temporal and semantic aspects. Case studies and user evaluations confirm the effectiveness of the system in analyzing public opinion on social media.
ContextWing: Pair-wise Visual Comparison for Evolving Sequential Patterns of Contexts in Social Media Data Streams

Visualizing Spatial Semantics of Dimensionally Reduced Text Embeddings
Authors: Wei Liu, Chris North, Rebecca Faust (2024)
This paper presents a gradient-based method for visualizing spatial semantics in dimensionally reduced text embeddings. The method applies gradients to assess the sensitivity of projected documents to the underlying words, helping users explore document similarities. A visualization system integrates spatial word clouds into the document projection space, illustrating important text features and providing practical applications for text analysis.
Visualizing Spatial Semantics of Dimensionally Reduced Text Embeddings

Know Your Audience: The Benefits and Pitfalls of Generating Plain Language Summaries Beyond the ‘General’ Audience
Authors: Tal August, Kyle Lo, Noah A. Smith, Katharina Reinecke (2024)
This paper investigates how language models can generate plain language summaries for different audience types. Through three within-subject studies, the authors found that simplifying text improved readability for readers with little familiarity with a topic, but more familiar readers preferred detailed summaries. The work highlights the trade-offs between simplicity and detail in audience-specific language generation.
Know Your Audience: The Benefits and Pitfalls of Generating Plain Language Summaries Beyond the ‘General’ Audience

Utility and Usability of Intrinsic Tag Maps
Authors: N. Yang, A. Maceachren, E. Domanico (2020)
This study evaluates the utility and usability of intrinsic tag maps, which fit tag clouds inside geographic boundaries to emphasize associations between tags and specific regions. The results show that geographic territory shape significantly impacts information retrieval and user confidence. The paper offers insights into improving tag map designs to enhance readability and performance in search tasks.
Utility and Usability of Intrinsic Tag Maps

Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
Authors: Andrew Head, Kyle Lo, Marti A. Hearst (2021)
This paper introduces ScholarPhi, an augmented reading interface that surfaces definitions of technical terms and symbols in scientific papers. The system provides position-sensitive tooltips, declutters papers to show how terms are used, and automatically generates glossaries, helping researchers of all levels comprehend complex academic texts.
Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols

Portrayal: Leveraging NLP and Visualization for Analyzing Fictional Characters
Authors: Md. Naimul Hoque, Bhavya Ghai, Kari Kraus, N. Elmqvist (2023)
Portrayal is an interactive visualization tool designed to help writers and scholars analyze fictional characters. The system uses NLP to extract characterization indicators and visualize them, enabling users to explore character dynamics and storylines. The paper reports positive feedback from both writers and scholars, indicating that the tool aids in story revision and analysis.
Portrayal: Leveraging NLP and Visualization for Analyzing Fictional Characters

Semantic-Preserving Word Clouds by Seam Carving
Authors: Yingcai Wu, Thomas Provan, Furu Wei, Shixia Liu, K. Ma (2011)
This paper presents a new method for generating semantic-preserving word clouds using seam carving, a content-aware image resizing technique. The method ensures that the layout of the word cloud is compact and preserves the overall semantic structure of the text. The authors designed interactive visualization techniques to facilitate visual text analysis and comparison, demonstrating the technique’s effectiveness through case studies.
Semantic-Preserving Word Clouds by Seam Carving

Parallel Tag Clouds to Explore and Analyze Faceted Text Corpora
Authors: C. Collins, F. Viégas, M. Wattenberg (2009)
This paper introduces Parallel Tag Clouds, a new visualization method for comparing facets of large metadata-rich text corpora. It combines parallel coordinates and traditional tag clouds, providing rich overviews of document collections. The paper addresses challenges such as selecting the best words to visualize and maintaining interactivity with large datasets. The method was tested on over 600,000 US Circuit Court decisions, revealing regional and linguistic differences between courts.
Parallel Tag Clouds to Explore and Analyze Faceted Text Corpora

Word Sense Disambiguation
Authors: R. Mihalcea (2010)
This paper explores methods for word sense disambiguation (WSD), a key challenge in natural language processing. The author compares several algorithms, including the Lesk algorithm, simplified Lesk, and algorithms using synonyms. After evaluating different approaches, the paper concludes that the simplified Lesk algorithm offers the best performance for implementing WSD in Java for the Gujarati language.
Word Sense Disambiguation

Visualizing Data Using t-SNE
Authors: L. Maaten, Geoffrey E. Hinton (2008)
This paper introduces t-SNE, a technique for visualizing high-dimensional data in two or three dimensions. t-SNE is a variation of Stochastic Neighbor Embedding (SNE) that improves visualization by reducing the crowding of data points at the center of the map. The paper compares t-SNE with other non-parametric visualization techniques, demonstrating its superior performance on various datasets.
Visualizing Data Using t-SNE

Participatory Visualization with Wordle
Authors: F. Viégas, M. Wattenberg, Jonathan Feinberg (2009)
This paper discusses the design and usage of Wordle, a web-based tool for visualizing text. Wordle creates tag-cloud-like displays that balance aesthetic criteria such as typography, color, and composition. The authors describe algorithms used to create Wordle layouts and present results from a large-scale user study, showing that Wordle has become a medium for personal expression and participatory culture.
Participatory Visualization with Wordle

Multi-objective Topic Modeling
Authors: O. Khalifa, D. Corne, M. Chantler, Fraser Halley (2013)
This paper introduces multi-objective approaches to topic modeling, offering an alternative to Latent Dirichlet Allocation (LDA). The authors demonstrate that multi-objective evolutionary algorithms (MOEA) improve topic coherence without sacrificing generalization ability. The paper compares MOEA with LDA, showing that MOEA produces more intuitive and distinct topic models, enhancing interpretability and application.
Multi-objective Topic Modeling

Beautiful Visualization: Looking at Data Through the Eyes of Experts
Authors: Julie Steele, Noah Iliinsky (2010)
This book brings together perspectives from two dozen experts in data visualization, exploring how they approach projects from diverse fields such as art, design, science, and statistics. The book emphasizes the role of storytelling in visualization and provides insights into how visualization can help make sense of complex data.
Beautiful Visualization: Looking at Data Through the Eyes of Experts

Visual Explanations: Images and Quantities, Evidence and Narrative
Author: E. Tufte (1997)
In this book, Edward Tufte explores how well-designed graphics can convey complex information efficiently. He discusses principles of graphic design, such as the use of color, space, and typography, to make data comprehensible. Tufte also critiques common mistakes in visual representation and advocates for clear, honest communication through visualizations.
Visual Explanations: Images and Quantities, Evidence and Narrative

BLEU: A Method for Automatic Evaluation of Machine Translation
Authors: K. Papineni, Salim Roukos, T. Ward, Wei-Jing Zhu (2002)
This paper proposes BLEU, an automatic evaluation method for machine translation. BLEU correlates highly with human judgments while being fast, inexpensive, and language-independent. The method is designed to quickly and frequently evaluate translations, minimizing the cost of human labor in the process.
BLEU: A Method for Automatic Evaluation of Machine Translation