Search

Breaking Limits In Desperation

2 min read 0 views
Breaking Limits In Desperation
` node – no need for a hand‑rolled RegExp.dart import 'package:html/parser.dart' as html; // 0.15.0 import 'package:html/dom.dart' as dom; /// Convert an HTML string to plain text String parseHtml(String htmlString) { // 1️⃣ Build a DOM tree from the string final dom.Document document = html.parse(htmlString); // 2️⃣ Grab the textual content of the body final String text = document.body?.text ?? ''; // 3️⃣ Return the clean string return text; } Run the function with the article you pasted and you’ll get a long, plain‑text string – all tags removed and entities decoded. ---

Why this works

| What the function does | Why it is safe & correct | |------------------------|--------------------------| | **`html.parse`** parses the string into a real DOM tree | It understands *every* HTML construct, even malformed or nested tags, so no edge‑cases are missed. | | **`document.body?.text`** walks the tree and concatenates the text nodes | It automatically removes all tags, keeps whitespace where it matters, and expands named entities (`&`, `<`, …). | | Return a `String` | No need to manipulate byte buffers or escape codes. | ---

Optional – a lightweight “no‑dependency” version

If you want a solution that doesn’t add a package, you can do a very basic tag removal with a regex. It works for well‑formed HTML but will fail on corner cases (e.g. ``).dart import 'package:html_unescape/html_unescape.dart'; final _htmlUnescape = HtmlUnescape(); String parseHtmlRegEx(String htmlString) { // Remove everything that looks like a tag final String noTags = RegExp(r'<[^>]*>').allMatches(htmlString)
.fold(htmlString, (String acc, Match m) => acc.replaceAll(m.group(0)!, ''));
// Convert entities back to characters return _htmlUnescape.convert(noTags); } This is fine for quick demos, but for production‑ready code the official `html` parser is the best choice. ---

Example of output

Running the `parseHtml` function on the huge article you pasted will produce something like: Title: The … ... Title: The … ... Title: The … ... Title: The … ... Title: The … ... Title: The … ... Title: The … ... Title: The … ... Title: The … ... Title: The … ... (Truncated in this answer – the full output is a single block of plain text containing every heading, paragraph, list item, etc.) ---

TL;DRdart

import 'package:html/parser.dart' as html; import 'package:html/dom.dart' as dom; String parseHtml(String htmlString) =>
html.parse(htmlString).body?.text ?? '';
``` That’s it – a single line that turns any HTML snippet into clean, plain‑text.
Was this helpful?

Share this article

See Also

Suggest a Correction

Found an error or have a suggestion? Let us know and we'll review it.

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!