Automatic Read Time Calculation in PayloadCMS

This article is about how to automatically set the read time for content using hooks within PayloadCMS. If you're using PayloadCMS, you might want to display how long it takes to read your articles. This can enhance the user experience by letting readers know how much time they'll need to invest.

Many resources out there don't cover an important detail: when using the getTextContent() function, blocks are not included. This means the calculated read time might be inaccurate because it doesn't account for all the content. Blocks can contain significant portions of text, and if they're not included, your read time will be off.

Let's begin the work by looking at how to extract fields from a deep structure using a string.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647

import type { Field } from 'payload'

export function findField<T extends Field>(fields: Field[], name: string): T | null {
    for (const field of fields) {
        if ('name' in field && field.name === name) {
            return field as T
        } else if ('fields' in field) {
            let result: Field | null = null

            if ('name' in field) {
                const [n1, ...n] = name.split('.')
                if (field.name === n1) {
                    result = findField(field.fields, n.join('.'))
                }
            } else {
                result = findField(field.fields, name)
            }

            if (result) {
                return result as T
            } else {
                continue
            }
        } else if ('tabs' in field) {
            for (const tab of field.tabs) {
                let result: Field | null = null

                if ('name' in tab) {
                    const [n1, ...n] = name.split('.')
                    if (tab.name === n1) {
                        result = findField(tab.fields, n.join('.'))
                    }
                } else {
                    result = findField(tab.fields, name)
                }

                if (result) {
                    return result as T
                } else {
                    continue
                }
            }
        }
    }

    return null
}

The code snippet provided is a function that helps find a specific field within a nested structure. This function, findField, takes two arguments: an array of fields and a string representing the name of the field you're looking for.

Now, we need to create a hook that can be reused across multiple collections. This hook should work regardless of the collection's specific structure. This flexibility is important because it allows you to apply the same logic to different types of content without having to rewrite code for each collection.

12345678910111213

import type { CollectionBeforeChangeHook, RichTextField } from 'payload'

import { findField } from 'src/utilities/findField'

export const updateReadTime: (fieldName: string) => CollectionBeforeChangeHook =
    (fieldName) =>
    ({ collection }) => {
        const contentField = findField<RichTextField>(collection.fields, fieldName)

        if (contentField) {
            // ...
        }
    }

The code snippet provided defines a hook named updateReadTime. This hook is a function that takes a field name as an argument and returns a CollectionBeforeChangeHook. Here's how it works:

The updateReadTime function accepts a fieldName parameter. This is the name of the field where your content is stored.
Inside the hook, it uses the findField utility to locate the specified field within the collection's fields.

The hook is then used within a collection configuration. For example, in the Posts collection configuration, the updateReadTime hook is applied in the beforeChange. Here's how it integrates:

1234567891011121314151617181920212223

export const Posts: CollectionConfig = {
    // ...
    hooks: {
        // ...
        beforeChange: [updateReadTime('content')],
    }
    fields: [
        // ...
        {
            name: 'content',
            type: 'richText',
            required: true,
        },
        {
            name: 'readTime',
            type: 'number',
            admin: {
                hidden: true,
            },
        },

    ]
}

The Posts collection has a beforeChange hook that includes updateReadTime('content'). This means that before any change to a post, the hook will execute using the "content" field.
The collection's fields include a content field, which is a rich text type and required. This is the field from which the read time will be calculated.
There's also a readTime field, which is a number type. This field stores the calculated read time. It is hidden from the admin interface to keep it out of sight for users who don't need to see it.

Most resources cover up to this part, where getTextContent() is used to extract text from rich text field.

12345678910111213141516171819202122232425262728293031323334353637383940414243

import {
	getEnabledNodes,
	type LexicalRichTextAdapter,
	type SanitizedServerEditorConfig,
} from '@payloadcms/richtext-lexical'
import { $getRoot } from '@payloadcms/richtext-lexical/lexical'
import { createHeadlessEditor } from '@payloadcms/richtext-lexical/lexical/headless'
import type { CollectionBeforeChangeHook, RichTextField } from 'payload'

import { findField } from 'src/utilities/findField'

export const updateReadTime: (fieldName: string) => CollectionBeforeChangeHook =
    (fieldName) =>
    ({ data, collection }) => {
        const contentField = findField<RichTextField>(collection.fields, fieldName)

        if (contentField) {
            const lexicalAdapter: LexicalRichTextAdapter =
                contentField.editor as LexicalRichTextAdapter

            const sanitizedServerEditorConfig: SanitizedServerEditorConfig =
                lexicalAdapter.editorConfig

            const headlessEditor = createHeadlessEditor({
                nodes: getEnabledNodes({
                    editorConfig: sanitizedServerEditorConfig,
                }),
            })

            headlessEditor.setEditorState(headlessEditor.parseEditorState(data.content))

            const textContent = headlessEditor.getEditorState().read(() => {
                return $getRoot().getTextContent()
            })

            let words = textContent
                .replace(/\r?\n|\r/g, ' ')
                .split(' ')
                .filter(Boolean)

            return data
        }
    }

The problem with using getTextContent() is that it doesn't include all the text in your content. This is particularly evident when you're using blocks in your content structure. For example, if you have a block like the QuoteBlock shown below, the text within this block won't be included in the read time calculation.

123456789101112131415161718

import type { Block } from 'payload'

export const QuoteBlock: Block = {
    slug: 'quote',
    interfaceName: 'QuoteBlock',
    fields: [
        {
            type: 'textarea',
            name: 'content',
            required: true,
        },
        {
            type: 'text',
            name: 'author',
            required: true,
        },
    ],
}

To ensure that all text, including that within blocks, is included in your read time calculation, you need to extract block nodes from Lexical and manually append the words from these blocks. This approach addresses the limitation of getTextContent() not capturing text within custom blocks.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394

import {
	getEnabledNodes,
	type LexicalRichTextAdapter,
	type SanitizedServerEditorConfig,
	type ServerBlockNode,
} from '@payloadcms/richtext-lexical'
import { $getRoot, ElementNode, type LexicalNode } from '@payloadcms/richtext-lexical/lexical'
import { createHeadlessEditor } from '@payloadcms/richtext-lexical/lexical/headless'
import type { CollectionBeforeChangeHook, RichTextField } from 'payload'

import { findField } from 'src/utilities/findField'

export const updateReadTime: (fieldName: string) => CollectionBeforeChangeHook =
    (fieldName) =>
    ({ data, collection }) => {
        const contentField = findField<RichTextField>(collection.fields, fieldName)

        if (contentField) {
            const lexicalAdapter: LexicalRichTextAdapter =
                contentField.editor as LexicalRichTextAdapter

            const sanitizedServerEditorConfig: SanitizedServerEditorConfig =
                lexicalAdapter.editorConfig

            const headlessEditor = createHeadlessEditor({
                nodes: getEnabledNodes({
                    editorConfig: sanitizedServerEditorConfig,
                }),
            })

            headlessEditor.setEditorState(headlessEditor.parseEditorState(data.content))

            const textContent = headlessEditor.getEditorState().read(() => {
                return $getRoot().getTextContent()
            })

            let words = textContent
	            .replace('Block Field', '') // blockNode.getTextContent() returns `Block Field`
                .replace(/\r?\n|\r/g, ' ')
                .split(' ')
                .filter(Boolean)

            const quotesWords = headlessEditor.getEditorState().read(() => {
                const blocks = getNodesByType<ServerBlockNode>('block')
                let words: string[] = []

                for (const block of blocks) {
                    const fields = block.getFields()

                    if (fields.blockType !== 'quote') {
                        continue
                    }

                    words = [
                        ...words,
                        ...fields.author
                            .replace(/\r?\n|\r/g, ' ')
                            .split(' ')
                            .filter(Boolean),
                        ...fields.content
                            .replace(/\r?\n|\r/g, ' ')
                            .split(' ')
                            .filter(Boolean),
                    ]
                }

                return words
            })

            words = [...words, ...quotesWords]

            return data
        }
    }

function getNodesByType<T extends LexicalNode>(type: string): T[] {
    const root = $getRoot()
    const matchingNodes: T[] = []

    function traverse(node: LexicalNode) {
        if (node.getType() === type) {
            matchingNodes.push(node as T)
        }

        if (node instanceof ElementNode) {
            for (const child of node.getChildren()) {
                traverse(child)
            }
        }
    }

    traverse(root)
    return matchingNodes
}

You're free to implement this approach with any block you have in your PayloadCMS setup. Whether it's a quote block, an image caption, or any other custom block, you can adapt the logic to include all the textual content from these blocks in your read time calculation.

Now that we've covered how to extract all the text, calculating the read time becomes straightforward. The formula is based on a study by Marc Brysbaert in 2019, which found that the average reading speed is about 238 words per minute. Therefore, the formula for calculating reading time is:

Reading Time = Total Word Count / 238

12345678910111213141516

export const updateReadTime: (fieldName: string) => CollectionBeforeChangeHook =
    (fieldName) =>
    ({ data, collection }) => {
        const contentField = findField<RichTextField>(collection.fields, fieldName)

        if (contentField) {
            // ...

            words = [...words, ...quotesWords]

            return {
                ...data,
                readTime: Math.ceil(words.length / 238)
            }
        }
    }

The final hook implementation looks like this:

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697

import {
	getEnabledNodes,
	type LexicalRichTextAdapter,
	type SanitizedServerEditorConfig,
	type ServerBlockNode,
} from '@payloadcms/richtext-lexical'
import { $getRoot, ElementNode, type LexicalNode } from '@payloadcms/richtext-lexical/lexical'
import { createHeadlessEditor } from '@payloadcms/richtext-lexical/lexical/headless'
import type { CollectionBeforeChangeHook, RichTextField } from 'payload'

import { findField } from 'src/utilities/findField'

export const updateReadTime: (fieldName: string) => CollectionBeforeChangeHook =
    (fieldName) =>
    ({ data, collection }) => {
        const contentField = findField<RichTextField>(collection.fields, fieldName)

        if (contentField) {
            const lexicalAdapter: LexicalRichTextAdapter =
                contentField.editor as LexicalRichTextAdapter

            const sanitizedServerEditorConfig: SanitizedServerEditorConfig =
                lexicalAdapter.editorConfig

            const headlessEditor = createHeadlessEditor({
                nodes: getEnabledNodes({
                    editorConfig: sanitizedServerEditorConfig,
                }),
            })

            headlessEditor.setEditorState(headlessEditor.parseEditorState(data.content))

            const textContent = headlessEditor.getEditorState().read(() => {
                return $getRoot().getTextContent()
            })

            let words = textContent
                .replace('Block Field', '') // blockNode.getTextContent() returns `Block Field`
                .replace(/\r?\n|\r/g, ' ')
                .split(' ')
                .filter(Boolean)

            const quotesWords = headlessEditor.getEditorState().read(() => {
                const blocks = getNodesByType<ServerBlockNode>('block')
                let words: string[] = []

                for (const block of blocks) {
                    const fields = block.getFields()

                    if (fields.blockType !== 'quote') {
                        continue
                    }

                    words = [
                        ...words,
                        ...fields.author
                            .replace(/\r?\n|\r/g, ' ')
                            .split(' ')
                            .filter(Boolean),
                        ...fields.content
                            .replace(/\r?\n|\r/g, ' ')
                            .split(' ')
                            .filter(Boolean),
                    ]
                }

                return words
            })

            words = [...words, ...quotesWords]

            return {
                ...data,
                readTime: Math.ceil(words.length / 238)
            }
        }
    }

function getNodesByType<T extends LexicalNode>(type: string): T[] {
    const root = $getRoot()
    const matchingNodes: T[] = []

    function traverse(node: LexicalNode) {
        if (node.getType() === type) {
            matchingNodes.push(node as T)
        }

        if (node instanceof ElementNode) {
            for (const child of node.getChildren()) {
                traverse(child)
            }
        }
    }

    traverse(root)
    return matchingNodes
}

And that's all! With this hook, you can dynamically calculate and update the read time for your content, ensuring it reflects all the text, including custom blocks. This enhances the accuracy and utility of the read time feature in your content management system.