ProductPromotion
Logo

Kotlin

made by https://0x3d.site

GitHub - ParserKt/ParserKt: Naive one-pass recursive descent, scannerless parser framework for Kotlin
Naive one-pass recursive descent, scannerless parser framework for Kotlin - ParserKt/ParserKt
Visit Site

GitHub - ParserKt/ParserKt: Naive one-pass recursive descent, scannerless parser framework for Kotlin

GitHub - ParserKt/ParserKt: Naive one-pass recursive descent, scannerless parser framework for Kotlin

ParserKt

Introduction

ParserKt is a naive one-pass recursive descent, scannerless parser framework for Kotlin (mainly JVM, compatible for JS)

A parser is a kind of program extracting data from input sequences:

NOTE: Using REPL from command line is not a good practice, you can create a project with dependency on JitPack or use IDEA Kotlin REPL instead.

git clone https://github.com/ParserKt/ParserKt.git && cd ParserKt
gradle shadowJar
kotlinc -cp build/libs/ParserKt-*.jar

Don't be scared! ParserKt is just a small library, even full-build won't take too much time. (or you can download pre-built jars here)

// Pattern.read(String), CharInput.STDIN, ...
// item(x), elementIn('a'..'z'), satisfy<Int> { it % 2 == 0 }, ...
// asList(), asString(), ...
import org.parserkt.*
import org.parserkt.util.*
import org.parserkt.pat.*

Import parserkt, util, pat, then we can implement our own combined pattern! (for a more in depth sample, see link at header)

val digits = Repeat(asList(), elementIn('0'..'9')) //{[0-9]}
digits.read("123") //[1, 2, 3]
digits.read("12a3") //[1, 2]
digits.read("a3") //null //(= notParsed)

A parser mainly recognize, and transform inputs into a simpler form (like AST) for post-processing (preetify, evaluate, type-check, ...), and it can be used to create (DSL) language tools / compilers / transpilers / interpreters.

// Generic parsing for [1, a, b] where (b > a)
val ints = Seq(::IntTuple, item(1), *Contextual(item<Int>()) { i -> satisfy<Int> { it > i } }.flatten().items() )
ints.rebuild(1,2,3) //[1, 2, 3]
ints.rebuild(1,2,1) //notParsed
i.rebuild(1,0,1) //[1, 0, 1]
// Using Convert patterns
fun digitItem(digit: SatisfyPattern<Char>) = Convert(digit, {it-'0'}, {'0'+it})
val digit = digitItem(elementIn('0'..'9'))
val digitsInt = Repeat(JoinFold(0) { this*10 + it }, digit)
digitsInt.read("233") //233

(click here if you want to try out more)

ParserKt divides complex grammar into simple subpart val definitions. "entry rule" could be all public structure — like item, elementIn, Repeat, and thus it's very easy to debug, and to reuse implemented syntax.

What means "one-pass"?

See FeedModel.kt:

interface Feed<out T> {
  val peek: T; fun consume(): T
  class End: NoSuchElementException("no more")
}

That's all one subparser can see, no mark/reset, not even one character supported for lookahead.

What means "recursive descent"?

Take a look of these expression: a + b, a + (b + c)

sealed class PlusAst {
  data class Plus(val left: PlusAst, val right: PlusAst): PlusAst()
  data class IntLiteral(val n: Int): PlusAst()
}

(sealed class are just classes with determinable subclass, in their inner name scope)

This data structure implies, every symbol of a + b could be an integer (like 0, 9, 16, ...IntLiteral(n)), or other a + b (1 + 2, 9 + 0, ...Plus(left, right))

PlusAst is recursive, so it's best to implement it's parser in recurse function form, that's "recursive".

Plus := (Plus '+' Plus) | Int  ; left associative
Int := {[0-9]}

({a} denotes repeat, a|b denotes alternative)

When reading input "1+2+3" by rule Plus, results Plus(Plus(Int(1), Int(2)), Int(3)).

Parser keep going to the state of reading substructure Int from reading Plus.

From rule Plus to its subrule Int, that's "descent".

——

ParserKt cannot make any lookaheads, so it's looks like impossible to parse syntax like //, /*, or 0x 0b (and error when 0(octal)).

In fact, "lookahead" can be stored in call stack of Pattern.read by pattern Contextual, so write parser for such syntax is possible (but much more complicated, so it's better to create new Pattern subclass, or fall back to tokenizer-parser LexerFeed, or use TriePattern)

What is the difference between "parser combinator" and "parser compiler"?

Let me clarify first: I really don't know what "combinator" is :(

I think combinators must be related to "functional programming", or at least "declarative programming" — Define what it is, not how computers actually solve it.

Parser combinator is mainly about code reuse, parser compiler is mainly about pattern matching algorithms.

Parser compilers and parser combinators solves the same problem in different ways. A parser compiler have better portability, and better performance (I'm not sure). A parser combinator integrates better with the host language, and it's really easy to write/debug, since it's just hand-wrote program.

For example, keywords about parser compilers: LL(k), LR(k), LALR, NFA, DFA, KMP, Aho–Corasick

:fearful: Scared? Recursive descent method are the most popular practice of handwriting parser used by many well-known projects like Lua, and it's also the best choice for code quality — source files generated by parser compilers offen a mess!

Features

  • Minimal boilerplate code & high reusability
  • DSL style & fully object-oriented library design
  • Rebuild parse result back to input (REPL friendly)
  • Generic parsing input (Char, String, XXXToken, ...)
  • One-pass input stream design without hasNext
  • ReaderFeed for implementating interactive shells (JVM only)
  • The framework manages parsed data in a flexible and extensible way (Tuple2, Sized-Tuple, Fold)
  • Supports complex contextual syntax structure that cannot be supported directly in parser compilers (LayoutPattern, NumUnitPattern, ...)
  • Extensible error messages using clam {"message"}, clamWhile(pat, defaultValue) {"message"}
  • Parser input stream Feed can have state storage: withState(T), stateAs<T>()
  • <500K compiled JVM library, no extra runtime needed (except kotlin-stdlib, Kotlin sequences are optional)
  • No magics, all code has been rewritten at least 9 times by the author, ugly/ambiguous/disordered codes are removed

Great thanks to Kotlin, for its strong expressibility and amazing type inference.

Provided combinators

Modules

  • :parserkt for Feed/Input model and basic (org.parserkt.pat) / complex (org.parserkt.pat.complex) patterns
  • :parserkt-util Fundamental data types (Slice, Tuple, Trie, ...) and helper class (Preety, AnyBy, ...) used/prepared for ParserKt and friends
  • :parserkt-ext Extension library for real-world parsers, eliminating boilerplate code (LayoutPattern, LexicalBasics, ...)

abbreviations

  • POPCorn (Pattern, OptionalPattern, PatternWrapper, ConstantPattern)
  • SURDIES (Seq, Until, Repeat, Decide, item, elementIn, satisfy)
  • CCDP (Convert, Contextual, Deferred, Piped)
  • SJIT (SurroundBy, JoinBy, InfixPattern, TriePattern)

More runnable REPL snippets

Note that for frequently-used pattern combinations, we have org.parserkt.pat.ext:

// Using pat.ext and LexicalBasics
import org.parserkt.pat.ext.*
val digitsInt = Repeat(asInt(), LexicalBasics.digitFor('0'..'9'))
digitsInt.read("180") //180
// Implementing complex pattern
import org.parserkt.pat.complex.*
// Patterns with constant values are OK to perform rebuild, even ignored in parse result
val letter = elementIn('A'..'Z', 'a'..'z', '0'..'9') or item('_')
val sep = elementIn(',',':').toConstant(',') named "sep"
val xsv = JoinBy(sep, Repeat(asString(), letter)) // try replace letter using !sep
xsv.read("monkey,banana,melon") //Tuple2(first=[monkey, banana, melon], second=[,, ,])

xsv.concatCharJoin().read("猴:雀:瓜") //Tuple2(first=[猴, 雀, 瓜], second=::)
val goodXsv = xsv.mergeConstantJoin()
goodXsv.read("she,her,herself") //[she, her, herself]
goodXsv.rebuild("she,her,herself") //she,her,herself

ParserKt provides mergeFirst/Second, discardFirst/Second, flatten for Tuple2 pattern converting :kissing_heart:

import org.parserkt.*
import org.parserkt.util.*
import org.parserkt.pat.*
import org.parserkt.pat.complex.*

val dict = TriePattern<Char, String>().apply {
  mergeStrings("hello" to "こんにちは")
  mergeStrings("world" to "世界")
}
val noun = Repeat(asList(), dict)
noun.read("helloworld") //[こんにちは, 世界]

val pharse = JoinBy(Decide(elementIn('0'..'9'), elementIn(' ', '\t', '\n', '\r')), dict)
pharse.read("hello6world hello") //Tuple2(first=[こんにちは, 世界, こんにちは], second=[Tuple2(first=0, second=6), Tuple2(first=1, second= )])

It's OK (and also easy) to read recursive structure, just defined lateinit var and use Deferred{name} to reference:

// Back converts (third argument for Convert) are optional
sealed class Sexp { data class Term(val name: String): Sexp(); data class Nest(val list: List<Sexp>): Sexp() }
lateinit var sexp: Pattern<Char, Sexp>
val str = Repeat(asString(), !elementIn(' ', '(', ')'))
val atom = Convert(str, { Sexp.Term(it) }, { it.name })

val nestItems = SurroundBy(parens.toCharPat(), JoinBy(item(' '), Deferred{sexp}).mergeConstantJoin())
val nest = Convert(nestItems, { Sexp.Nest(it) }, { it.list })
sexp = Decide(nest, atom).mergeFirst { if (it is Sexp.Nest) 0 else 1 }

sexp.read("((monkey banana) (deep parser))") //Nest(list=[Nest(list=[Term(name=monkey), Term(name=banana)]), Nest(list=[Term(name=deep), Term(name=parser)])])
sexp.rebuild("((monkey banana) (deep parser))")  //((monkey banana) (deep parser))

NOTE: when using pattern Until, think if it can be expressed by Repeat(..., !SatisfPattern) first

References

References on Historic

(script for GitHub pages)

More Resources
to explore the angular.

mail [email protected] to add your project or resources here 🔥.

Related Articles
to learn about angular.

FAQ's
to learn more about Angular JS.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory