Rich Text Fields: Built-in "Cleanse Formatting" button to sanitize pasted Word/third-party content
When applicants paste content from Microsoft Word or scientific journals into Rich Text fields, the editor silently injects hundreds of problematic HTML artifacts — deeply nested <span> tags with inline styles (font-family, font-size, color, border, background), Microsoft Office markup (mso-* properties, <o:p> namespace tags), Word bookmark anchors (<a name="_Hlk...">), and redundant wrapper elements.
From the applicant's perspective, the content looks fine in the editor. The problems surface downstream: PDF conversions break or produce corrupted output, field rendering becomes inconsistent, and in some cases, the Rich Text field functions themselves stop working properly. The applicant has no way to know their paste introduced bad markup, and no tools within the editor to fix it.
This is especially common in grant management and research funding workflows, where applicants routinely draft in Word and paste scientific content containing superscripts, subscripts, special characters, and complex formatting — all of which generate excessive hidden markup on paste.
Proposed Solution:
Add a native "Cleanse Formatting" button to the KendoUI Rich Text editor toolbar that strips unsafe/problematic markup while preserving meaningful formatting. Specifically, it should:
Strip: All inline styles, class/id/data attributes, <font> tags, empty/redundant <span> and <div> wrappers, Word bookmark anchors, XML namespace tags, HTML comments, base64-embedded images, and Microsoft Office conditional comments
Preserve: Semantic formatting (bold, italic, underline, strikethrough), superscripts/subscripts (critical for scientific notation), lists, links, tables (with colspan/rowspan), headings, and standard images
This gives applicants a one-click way to "clean up" their pasted content without losing the structural formatting they care about.
Evidence — Proof of Concept:
We built and deployed a Browser Script that does exactly this. It injects a "Cleanse" button into every Rich Text editor toolbar on the page, uses the SmartSimple RichText API to read/set content, and runs a full DOM-based sanitizer. In real-world testing against scientific grant application content pasted from Word, it removed 150-200+ problematic tags per field while preserving all meaningful formatting — including superscript citation numbers, gene notation in italics, and bolded figure references. PDF conversion issues were fully resolved after cleansing.
we are looking at other ways to support content/format and enhanced ways to work with Word, but not natively within SmS, we have no immediate plans to support this use case