ProductPromotion
Logo

Kotlin

made by https://0x3d.site

GitHub - sokomishalov/skraper: Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, Coub, Vimeo, IFunny, VK, Odnoklassniki, Pikabu)
Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitc...
Visit Site

GitHub - sokomishalov/skraper: Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, Coub, Vimeo, IFunny, VK, Odnoklassniki, Pikabu)

GitHub - sokomishalov/skraper: Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, Coub, Vimeo, IFunny, VK, Odnoklassniki, Pikabu)

Skraper

Here should be some fancy logo

Awesome Kotlin Badge Apache License 2

Overview

Kotlin/Java library and cli tool which allows scraping and downloading posts, attachments, other meta from more than 10 sources without any authorization or full page rendering. Based on jsoup, jackson and kotlin-coroutines.

Repository contains:

Current list of implemented sources:

Bugs

Unfortunately, each web-site is subject to change without any notice, so the tool may work incorrectly because of that. If that happens, please let me know via an issue.

Cli tool

Cli tool allows to:

  • download media with flag --media-only from almost all presented sources.
  • scrape posts meta information

Requirements:

  • Java: 1.8 +
  • Maven (optional)

Build tool

./mvnw clean package -DskipTests=true 

Usage:

./skraper --help
usage: [-h] PROVIDER PATH [-n LIMIT] [-t TYPE] [-o OUTPUT] [-m]
       [--parallel-downloads PARALLEL_DOWNLOADS]

optional arguments:
  -h, --help                                show this help message and exit

  -n LIMIT, --limit LIMIT                   posts limit (50 by default)

  -t TYPE, --type TYPE                      output type, options: [log, csv, json, xml, yaml]

  -o OUTPUT, --output OUTPUT                output path

  -m, --media-only                          scrape media only

  --parallel-downloads PARALLEL_DOWNLOADS   amount of parallel downloads for media items if
                                            enabled flag --media-only (4 by default)


positional arguments:
  PROVIDER                                  skraper provider, options: facebook, instagram,
                                            twitter, youtube, tiktok, telegram, twitch, reddit,
                                            9gag, pinterest, flickr, tumblr, ifunny, vk, pikabu,
                                            vimeo, odnoklassniki, coub

  PATH                                      path to user/community/channel/topic/trend

Examples:

./skraper 9gag /hot 
./skraper reddit /r/memes -n 5 -t csv -o ./reddit/posts
./skraper instagram /explore/tags/memes -t json
./skraper flickr /photos/harrythehawk -t yaml
./skraper pinterest /levato/meme -t xml
./skraper youtube /user/JetBrainsTV/videos --media-only -n 2

Kotlin Library

Distribution

Maven:

<dependency>
    <groupId>ru.sokomishalov.skraper</groupId>
    <artifactId>skrapers</artifactId>
    <version>x.y.z</version>
</dependency>

Gradle kotlin dsl:

implementation("ru.sokomishalov.skraper:skrapers:x.y.z")

Usage

Instantiate specific scraper

As mentioned before, the provider implementation list is:

After that usage as simple as is:

val skraper = InstagramSkraper(client = OkHttpSkraperClient())

Important moment: it is highly recommended to not use DefaultBlockingSkraperClient . There are some more efficient, non-blocking and resource-friendly implementations for SkraperClient. To use them you just have to put required dependencies in the classpath.

Current http-client implementation list:

Available methods

Each scraper is a class which implements Skraper interface:

interface Skraper {
    val client: SkraperClient
    fun getPosts(path: String): Flow<Post>
    suspend fun getPageInfo(path: String): PageInfo?
    fun supports(media: Media): Boolean
    suspend fun resolve(media: Media): Media
}

Also, there are some provider-specific kotlin extensions for implementations. You can find them out at the provider implementation package.

Usage from plain Java

There is an out-of-box java interop utility class ru.sokomishalov.skraper.util.JavaInterop:

class Example {
    public static void main(String[] args) {
      Skraper skraper = new InstagramSkraper();
      List<Post> posts = JavaInterop.limitedFlow(skraper.getPosts("/memes.video"), 10);
      PageInfo info = JavaInterop.callBlocking(cont -> skraper.getPageInfo("/memes.video", cont));
    }
}

Scrape user/community/channel/topic/trend posts

To scrape the latest posts for specific user, channel or trend use skraper like that:

suspend fun main() {
    val skraper = FacebookSkraper()
    val posts = skraper.getUserPosts(username = "memes").take(2).toList() // extension for getPosts()
    // or 
    val postsDetected = Skrapers.getPosts(url = "https://facebook.com/memes") // aggregating singleton
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(posts))
}

Received data structure is similar to each other provider's. Output data example:

[
  {
    "id": "5029851093699104",
    "text": "gotta love em!",
    "publishedAt": 1580744400000,
    "statistics": {
      "likes": 79,
      "comments": 3
    },
    "media": [
      {
        "url": "https://facebook.com/memes/posts/5029851093699104?__xts__%5B0%5D=68.ARA2yRI2YnlXQRKX7Pdphh8ztgvnP11aYE_bZFPNmqLpJZLhwJaG24gDPUTiKDLv-J_E09u2vLjCXalpmEuGSmVR0BkVtcng_i6QV8x5e-aZUv0Mkn1wwKLlhp5NNH6zQWKlqDqRjZrwvcKeUi0unzzulRCHRvDIrbz2leM6PLescFySwMYbMmKFc7ctqaC_F7nJ09Ya0lz9Pqaq_Rh6UsNKom6fqdgHAuoHV894a3QRuyY0BC6fQuXZLOLbRIfEVK3cF9Z5UQiXUYruCySF-WpQEV0k72x6DIjT6B3iovYFnBGHaji9VAx2PByZ-MDs33D1Hz96Mk-O1Pj7zBwO6FvXGhkUJgepiwUOVd0q-pV83rS5EhjtPFDylNoNO2xkDUSIi483p49vumVPWtmab8LX1V6w2anf55kh6pedCXcH3D8rBjz8DaTBnv995u9kk5im-1-HdAGQHyKrCZpaA0QyC-I4oGsCoIJGck3RO8u_SoHcfe2tKjTgPe6j9p1D&__tn__=-R",
        "aspectRatio": 0.864,
        "duration": 10860.000000000
      }
    ]
  },
  {
    "id": "4990218157662398",
    "text": "Interesting",
    "publishedAt": 1580742000000,
    "statistics": {
      "likes": 3092,
      "comments": 514
    },
    "media": [
      {
        "url": "https://scontent.fhrk1-1.fna.fbcdn.net/v/t1.0-0/p526x296/52333452_10157743612509879_529328953723191296_n.png?_nc_cat=1&_nc_ohc=oNMb8_mCbD8AX-w9zeY&_nc_ht=scontent.fhrk1-1.fna&oh=ca8a719518ecfb1a24f871282b860124&oe=5E910D0C",
        "aspectRatio": 0.8960573476702509
      }
    ]
  }
]

You can see the full model structure for posts and others here

Scrape user/community/channel/topic/trend info

It is possible to scrape user/channel/trend info for some purposes:

suspend fun main() {
    val skraper = TwitterSkraper()
    val pageInfo = skraper.getUserInfo(username = "memes") // extension for `getPageInfo()`
    // or 
    val pageInfoDetected = Skrapers.getPageInfo(url = "https://twitter.com/memes") // aggregating singleton
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(pageInfo))
}

Output:

{
  "nick": "memes",
  "name": "Memes.com",
  "description": "http://memes.com is your number one website for the funniest content on the web. You will find funny pictures, funny memes and much more.",
  "statistics": {
    "posts": 10848,
    "followers": 154718
  },
  "avatar": {
    "url": "https://pbs.twimg.com/profile_images/824808708332941313/mJ4xM6PH_normal.jpg"
  },
  "cover": {
    "url": "https://abs.twimg.com/images/themes/theme1/bg.png"
  }
}

Resolve provider relative url

Sometimes you need to know direct media link:

suspend fun main() {
    val skraper = InstagramSkraper()
    val info = skraper.resolve(Video(url = "https://www.instagram.com/p/B-flad2F5o7/"))
    val serializer = JsonMapper().writerWithDefaultPrettyPrinter()
    println(serializer.writeValueAsString(info))
}

Output:

{
  "url": "https://scontent-amt2-1.cdninstagram.com/v/t50.2886-16/91508191_213297693225472_2759719910220905597_n.mp4?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=104&_nc_ohc=27bC52qar_oAX-7J2Zh&oe=5EC0BC52&oh=0aafee2860c540452b76e7b8e336147d",
  "aspectRatio": 0.8010012515644556,
  "thumbnail": {
    "url": "https://scontent-amt2-1.cdninstagram.com/v/t51.2885-15/e35/91435498_533808773845524_5302421141680378393_n.jpg?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=100&_nc_ohc=8gPAcByc6YAAX_kDBWm&oh=5edf6b9d90d606f9c0e055b7dbcbfa45&oe=5EC0DDE8",
    "aspectRatio": 0.8010012515644556
  }
}

Download media

There is "static" method which allows to download any media from all known implemented sources:

suspend fun main() {
    val tmpDir = Files.createTempDirectory("skraper").toFile()

    val testVideo = Skrapers.download(
        media = Video("https://youtu.be/fjUO7xaUHJQ"),
        destDir = tmpDir,
        filename = "Gandalf"
    )

    val testImage = Skrapers.download(
        media = Image("https://www.pinterest.ru/pin/89509111320495523/"),
        destDir = tmpDir,
        filename = "Do_no_harm"
    )

    println(testVideo)
    println(testImage)
}

Output:

/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Gandalf.mp4
/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Do_no_harm.jpg

Telegram bot

To use the bot follow the link.

More Resources
to explore the angular.

mail [email protected] to add your project or resources here 🔥.

Related Articles
to learn about angular.

FAQ's
to learn more about Angular JS.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory