` node – no need for a hand‑rolled RegExp.dart
import 'package:html/parser.dart' as html; // 0.15.0
import 'package:html/dom.dart' as dom;
/// Convert an HTML string to plain text
String parseHtml(String htmlString) {
// 1️⃣ Build a DOM tree from the string
final dom.Document document = html.parse(htmlString);
// 2️⃣ Grab the textual content of the body
final String text = document.body?.text ?? '';
// 3️⃣ Return the clean string
return text;
}
Run the function with the article you pasted and you’ll get a long,
plain‑text string – all tags removed and entities decoded.
---
Why this works
| What the function does | Why it is safe & correct |
|------------------------|--------------------------|
| **`html.parse`** parses the string into a real DOM tree | It understands *every* HTML construct, even malformed or nested tags, so no edge‑cases are missed. |
| **`document.body?.text`** walks the tree and concatenates the text nodes | It automatically removes all tags, keeps whitespace where it matters, and expands named entities (`&`, `<`, …). |
| Return a `String` | No need to manipulate byte buffers or escape codes. |
---
Optional – a lightweight “no‑dependency” version
If you want a solution that doesn’t add a package, you can do a very
basic tag removal with a regex.
It works for well‑formed HTML but will fail on corner cases (e.g. `
`).dart
import 'package:html_unescape/html_unescape.dart';
final _htmlUnescape = HtmlUnescape();
String parseHtmlRegEx(String htmlString) {
// Remove everything that looks like a tag
final String noTags = RegExp(r'<[^>]*>').allMatches(htmlString)
.fold(htmlString, (String acc, Match m) => acc.replaceAll(m.group(0)!, ''));
// Convert entities back to characters
return _htmlUnescape.convert(noTags);
}
This is fine for quick demos, but for production‑ready code the
official `html` parser is the best choice.
---
Example of output
Running the `parseHtml` function on the huge article you pasted will
produce something like:
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
Title: The …
...
(Truncated in this answer – the full output is a single block of
plain text containing every heading, paragraph, list item, etc.)
---
TL;DRdart
import 'package:html/parser.dart' as html;
import 'package:html/dom.dart' as dom;
String parseHtml(String htmlString) =>
html.parse(htmlString).body?.text ?? '';
```
That’s it – a single line that turns any HTML snippet into clean,
plain‑text.
No comments yet. Be the first to comment!