Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML front matter parser does not strip single or double quotes on scalars #260

Closed
JakeWharton opened this issue May 19, 2022 · 6 comments · Fixed by #261
Closed

YAML front matter parser does not strip single or double quotes on scalars #260

JakeWharton opened this issue May 19, 2022 · 6 comments · Fixed by #261
Labels

Comments

@JakeWharton
Copy link

Steps to reproduce the problem (provide example Markdown if applicable):

---
title: 'Sixteen corners'
layout: post
---

Last year

Expected behavior:

frontMatter["title"].single() == "Sixteen corners"

Actual behavior:

Screen Shot 2022-05-18 at 10 56 24 PM

Note the retained single quotes (') surrounding the value. This is also a problem with double quotes (").

YAML spec dictates behavior on unquoted, single-quoted, and double-quoted scalars: https://yaml.org/spec/1.2.2/#73-flow-scalar-styles

@robinst
Copy link
Collaborator

robinst commented Jun 1, 2022

Yeah you're right, that's broken. The reason for that is that it currently does some manual (and very limited) YAML parsing, see YamlFrontMatterBlockParser. We should probably never have done that, and just depend on a real YAML parser in that extension.

So two ways to fix this:

  1. Extend our manual parser to handle quoting
  2. Depend on a YAML library (which one?) instead, to be able to parse YAML 1.1 or 1.2 (which one?)

Do you have any opinions on those @JakeWharton? If we do 1, we can also consider adding support for retrieving the raw YAML source (as a single String) to YamlFrontMatterVisitor, for people with exotic YAML that want to parse it themselves.

robinst added a commit that referenced this issue Jun 1, 2022
Extend our manual parser to handle string values that use single or
double quotes. The support is limited and doesn't implement the full
YAML spec (e.g. no support for escapes like `\n`).

At some point we should either depend on a real YAML parser or expose
the raw YAML source so that users can parse it themselves.

Fixes #260.
@robinst
Copy link
Collaborator

robinst commented Jun 1, 2022

@JakeWharton
Copy link
Author

Yeah I ended up doing a form of the String-extraction where I simply pre-processed the input data to conditionally extract the front matter per its "specification". The upside is I can parse front matter on all files not just markdown (not sure I specifically need this). The downside is lost using this library's types as a unified model and instead have my own composite type of front matter + markdown.

I don't recall whether I'm using the 1.1 or 1.2 version of SnakeYAML. I'm under-educated on the difference.

For this specific quoting issue, you must quote values if they contain a colon in order for Jekyll's parser to correctly parse the value. Since I'm sharing front matter-containing markdown files with Jekyll as well as my tool I need to honor that. My example above doesn't have a colon, but I tend to copy/paste the last blog post when I create a new one and so the quoting has now persisted to about half my files.

@robinst
Copy link
Collaborator

robinst commented Jun 2, 2022

@JakeWharton
Copy link
Author

Yep! Looks good.

@robinst
Copy link
Collaborator

robinst commented Jun 2, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants