Make TokenKind and ExpressionKind more generic. #10

Closed
opened 2023-08-28 01:22:11 +00:00 by zyxw59 · 7 comments
zyxw59 commented 2023-08-28 01:22:11 +00:00 (Migrated from github.com)

Currently TokenKind hard-codes the set of possible token kinds:

  • Tag
  • Integer
  • Float
  • String
  • Unterminated string

and similarly, ExpressionKind hard-codes the set of possible expression types:

  • Binary operator
  • Unary operator
  • Integer
  • Float
  • String
  • Variable
  • Null

These should really be made generic.

Some changes that will be necessary to support this:

  • Parser should delegate parsing of individual tokens to the ParseContext — i.e. if a particular parser has floats, it should provide a float parser. This basically amounts to replacing the match on kind in parse_term and parse_operator with matching on the result of [some method on ParseContext]; this should flatten with the existing match in the TokenKind::Tag branch.
  • ParseContext should add a new associated type: Term, which is a type that can represent a single term in the expression queue.
  • ParseErrorKind should allow for errors specific to a particular ParseContext (maybe it should have a generic Other variant, and ParseContext has an Error associated type?)
Currently [`TokenKind`] hard-codes the set of possible token kinds: - Tag - Integer - Float - String - Unterminated string and similarly, [`ExpressionKind`] hard-codes the set of possible expression types: - Binary operator - Unary operator - Integer - Float - String - Variable - Null These should really be made generic. Some changes that will be necessary to support this: - `Parser` should delegate parsing of individual tokens to the `ParseContext` — i.e. if a particular parser has floats, it should provide a float parser. This basically amounts to replacing the match on `kind` in `parse_term` and `parse_operator` with matching on the result of \[some method on `ParseContext`\]; this should flatten with the existing match in the `TokenKind::Tag` branch. - `ParseContext` should add a new associated type: `Term`, which is a type that can represent a single term in the expression queue. - `ParseErrorKind` should allow for errors specific to a particular `ParseContext` (maybe it should have a generic `Other` variant, and `ParseContext` has an `Error` associated type?) [`TokenKind`]: https://zyxw59.github.io/expr-parser/expr_parser/token/enum.TokenKind.html [`ExpressionKind`]: https://zyxw59.github.io/expr-parser/expr_parser/expression/enum.ExpressionKind.html
zyxw59 commented 2023-08-29 01:44:16 +00:00 (Migrated from github.com)

augh so many generics/associated types

augh so many generics/associated types
zyxw59 commented 2023-08-30 00:03:44 +00:00 (Migrated from github.com)

Ok so there are (at least) three stages in this library:

  1. tokenizer
  2. parser
  3. evaluator

The tokenizer and the parser have one type in common: TokenKind. The parser and the evaluator have 3 types in common: BinaryOperator, UnaryOperator, and Term. So the parser has 4 types associated with it just for its connection to the other layers, plus some other types which are internal to the parser: Error and Delimiter

Ok so there are (at least) three stages in this library: 1. tokenizer 2. parser 3. evaluator The tokenizer and the parser have one type in common: `TokenKind`. The parser and the evaluator have 3 types in common: `BinaryOperator`, `UnaryOperator`, and `Term`. So the parser has 4 types associated with it just for its connection to the other layers, plus some other types which are internal to the parser: `Error` and `Delimiter`
zyxw59 commented 2023-09-29 02:58:34 +00:00 (Migrated from github.com)

with the idea that trait type parameters are for inputs and associated types are for outputs, this would look like

trait Tokenizer {
    type TokenKind;
    // ...
}
trait ParseContext<T> {
    type BinaryOperator;
    type UnaryOperator;
    type Term;
    type Delimiter;
    type Error;
    // ...
}
trait Evalutor<B, U, T> {
    type Output;
    // ...
}
with the idea that trait type parameters are for inputs and associated types are for outputs, this would look like ```rust trait Tokenizer { type TokenKind; // ... } trait ParseContext<T> { type BinaryOperator; type UnaryOperator; type Term; type Delimiter; type Error; // ... } trait Evalutor<B, U, T> { type Output; // ... } ```
zyxw59 commented 2023-10-01 18:56:57 +00:00 (Migrated from github.com)

it has previously been noted that apart from the methods to access the input string, Tokenizer is basically just Iterator<Item = (Token, Self::TokenKind)>

what if a parser was like an Iterator<Item = Expression>? On the inside it'd still look basically like the parser trait, but it could be basically presented as an iterator adapter — something like MyTokenizer::new(input).parse(MyParser::new()).evaluate(MyEvaluator::new()), with the parse and evaluate methods being part of an extension trait with the appropriate bounds.

it has previously been noted that apart from the methods to access the input string, `Tokenizer` is basically just `Iterator<Item = (Token, Self::TokenKind)>` what if a parser was like an `Iterator<Item = Expression>`? On the inside it'd still look basically like the parser trait, but it could be basically presented as an iterator adapter — something like `MyTokenizer::new(input).parse(MyParser::new()).evaluate(MyEvaluator::new())`, with the `parse` and `evaluate` methods being part of an extension trait with the appropriate bounds.
zyxw59 commented 2023-10-01 20:39:26 +00:00 (Migrated from github.com)

source is necessary for exactly one thing, which is generating an end-of-input span and token, for the Null expression that gets inserted if the end-of-input is reached in Initial state (either empty input, or last token has an optional right operand).

…if I changed how I handled optional right operands, in such a way as to get rid of Null, this wouldn't be an issue.

`source` is necessary for exactly one thing, which is generating an end-of-input span and token, for the `Null` expression that gets inserted if the end-of-input is reached in `Initial` state (either empty input, or last token has an optional right operand). …if I changed how I handled optional right operands, in such a way as to get rid of `Null`, this wouldn't be an issue.
zyxw59 commented 2023-10-02 00:16:44 +00:00 (Migrated from github.com)

what if a parser was like an Iterator<Item = Expression>? On the inside it'd still look basically like the parser trait, but it could be basically presented as an iterator adapter — something like MyTokenizer::new(input).parse(MyParser::new()).evaluate(MyEvaluator::new()), with the parse and evaluate methods being part of an extension trait with the appropriate bounds.

This doesn't work, because the parser can also return errors.

> what if a parser was like an `Iterator<Item = Expression>`? On the inside it'd still look basically like the parser trait, but it could be basically presented as an iterator adapter — something like `MyTokenizer::new(input).parse(MyParser::new()).evaluate(MyEvaluator::new())`, with the `parse` and `evaluate` methods being part of an extension trait with the appropriate bounds. This doesn't work, because the parser can also return errors.
zyxw59 commented 2024-02-18 02:06:00 +00:00 (Migrated from github.com)

The bigger issue is that a single token can parse to multiple expressions (e.g. due to popping things from the stack), so the parser still needs to collect into a queue at least as an intermediate; also, while a tokenizer/parser/evaluator that is totally pipelined as iterators is aesthetically pleasing, it makes logging and error handling far more convoluted (for example, in the fully pipelined setup, a parse error would likely result in a lot of wasted work by the evaluator).

The bigger issue is that a single token can parse to multiple expressions (e.g. due to popping things from the stack), so the parser still needs to collect into a queue at least as an intermediate; also, while a tokenizer/parser/evaluator that is totally pipelined as iterators is aesthetically pleasing, it makes logging and error handling far more convoluted (for example, in the fully pipelined setup, a parse error would likely result in a lot of wasted work by the evaluator).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
mle/selkirk#10
No description provided.