ProductPromotion
Logo

Kotlin

made by https://0x3d.site

GitHub - fleeksoft/ksoup: Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup.
Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup. - fleeksoft/ksoup
Visit Site

GitHub - fleeksoft/ksoup: Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup.

GitHub - fleeksoft/ksoup: Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup.

Ksoup: Kotlin Multiplatform HTML & XML Parser

Ksoup is a Kotlin Multiplatform library for working with real-world HTML and XML. It's a port of the renowned Java library, jsoup, and offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM and CSS selectors.

Kotlin Apache-2.0 Maven Central

badge-android badge-ios badge-mac badge-tvos badge-jvm badge-linux badge-windows badge-js badge-wasm

Ksoup implements the WHATWG HTML5 specification, parsing HTML to the same DOM as modern browsers do, but with support for Android, JVM, and native platforms.

Features

  • Scrape and parse HTML from a URL, file, or string
  • Find and extract data using DOM traversal or CSS selectors
  • Manipulate HTML elements, attributes, and text
  • Clean user-submitted content against a safe-list to prevent XSS attacks
  • Output tidy HTML

Ksoup is adept at handling all varieties of HTML found in the wild.

Getting started

Ksoup is published on Maven Central

Include the dependency in commonMain. Latest version Maven Central

Ksoup published in four variants. Pick the one that suits your needs and start building!

  1. This variant is built without any external IO or Network dependencies. Use this if you want to parse HTML from a string.

    implementation("com.fleeksoft.ksoup:ksoup-lite:<version>")
    
  2. This variant built with kotlinx-io and Ktor 3

    implementation("com.fleeksoft.ksoup:ksoup:<version>")
    
     // Optional: Include only if you need to use network request functions such as
     // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network:<version>")
    
  3. This variant is built with korlibs-io

    implementation("com.fleeksoft.ksoup:ksoup-korlibs:<version>")
    
     // Optional: Include only if you need to use network request functions such as
     // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network-korlibs:<version>")
    
  4. This variant built with kotlinx-io and Ktor 2

    implementation("com.fleeksoft.ksoup:ksoup-ktor2:<version>")
    
     // Optional: Include only if you need to use network request functions such as
     // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network-ktor2:<version>")
    
  5. This variant built with okio and Ktor 2

    implementation("com.fleeksoft.ksoup:ksoup-okio:<version>")
    
     // Optional: Include only if you need to use network request functions such as
     // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network-ktor2:<version>")
    

    NOTE: Variants built with kotlinx do not support gzip files.

Parsing HTML from a String with Ksoup

For API documentation you can check Jsoup. Most of the APIs work without any changes.

val html = "<html><head><title>One</title></head><body>Two</body></html>"
val doc: Document = Ksoup.parse(html = html)

println("title => ${doc.title()}") // One
println("bodyText => ${doc.body().text()}") // Two

This snippet demonstrates how to use Ksoup.parse for parsing an HTML string and extracting the title and body text.

Fetching and Parsing HTML from a URL using Ksoup

//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.
val doc: Document = Ksoup.parseGetRequest(url = "https://en.wikipedia.org/") // suspend function
// or
val doc: Document = Ksoup.parseGetRequestBlocking(url = "https://en.wikipedia.org/")

println("title: ${doc.title()}")
val headlines: Elements = doc.select("#mp-itn b a")

headlines.forEach { headline: Element ->
    val headlineTitle = headline.attr("title")
    val headlineLink = headline.absUrl("href")

    println("$headlineTitle => $headlineLink")
}

Parsing Metadata from Website

//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.
val doc: Document = Ksoup.parseGetRequest(url = "https://en.wikipedia.org/") // suspend function
val metadata: Metadata = Ksoup.parseMetaData(element = doc) // suspend function
// or
val metadata: Metadata = Ksoup.parseMetaData(html = HTML)

println("title: ${metadata.title}")
println("description: ${metadata.description}")
println("ogTitle: ${metadata.ogTitle}")
println("ogDescription: ${metadata.ogDescription}")
println("twitterTitle: ${metadata.twitterTitle}")
println("twitterDescription: ${metadata.twitterDescription}")
// Check com.fleeksoft.ksoup.model.MetaData for more fields

In this example, Ksoup.parseGetRequest fetches and parses HTML content from Wikipedia, extracting and printing news headlines and their corresponding links.

Ksoup Public functions

  • Ksoup.parse
  • Ksoup.parseFile
  • Ksoup.clean
  • Ksoup.isValid

Ksoup Network Public functions

  • Suspend functions
    • Ksoup.parseGetRequest
    • Ksoup.parseSubmitRequest
    • Ksoup.parsePostRequest
  • Blocking functions
    • Ksoup.parseGetRequestBlocking
    • Ksoup.parseSubmitRequestBlocking
    • Ksoup.parsePostRequestBlocking

For further documentation, please check here: Jsoup

Ksoup vs. Jsoup Performance: Parsing & Selecting 448KB HTML File test.tx

Ksoup vs Jsoup

Ksoup vs Jsoup

Open source

Ksoup is an open source project, a Kotlin Multiplatform port of jsoup, distributed under the Apache License, Version 2.0. The source code of Ksoup is available on GitHub.

Development and Support

For questions about usage and general inquiries, please refer to GitHub Discussions.

If you wish to contribute, please read the Contributing Guidelines.

To report any issues, visit our GitHub issues, Please ensure to check for duplicates before submitting a new issue.

Library Status

Platform Status Notes
Android Stable
JVM Stable
iOS Stable
JS Alpha
WasmJs Alpha not supported with ktor2
Native MacOS Alpha
Linux Experimental
Native Windows Experimental

License

Copyright 2023 Sabeeh Ul Hussnain

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

More Resources
to explore the angular.

mail [email protected] to add your project or resources here 🔥.

Related Articles
to learn about angular.

FAQ's
to learn more about Angular JS.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory