Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format a Temporal object into a user-supplied string format #5

Open
ptomato opened this issue Feb 12, 2021 · 23 comments
Open

Format a Temporal object into a user-supplied string format #5

ptomato opened this issue Feb 12, 2021 · 23 comments

Comments

@ptomato
Copy link
Collaborator

ptomato commented Feb 12, 2021

Similar to #2, but for string formatting instead of string parsing.

If #2 is adopted, an API using the same microformat could be proposed.

Advantages:

Sometimes it's necessary to obtain a string that fits a particular format that is neither ISO 8601 nor locale-sensitive. For these cases, third-party date libraries usually provide a formatting API.

Concerns:

In general, for human-readable strings, developers should not customize the format themselves, and use only the locale-specific formats provided by Intl.DateTimeFormat. While this functionality might be useful for formatting strings that must fit a certain format for machine-readability, providing such an API might result in worse user experiences because developers are tempted to use it for human-readable strings instead of Intl.DateTimeFormat.

Unlike parsing, this kind of functionality is easy to build in business logic.

Prior art:

TBD

Constraints / corner cases:

TBD

@CarterLi
Copy link

CarterLi commented Aug 5, 2021

All I want is something like https://date-fns.org/v2.23.0/docs/format

eg. format(Date.now(), 'yyyy-MM-dd HH:mm:ss')

I have read the doc of DateTimeFormat and I don't think it can match my requirement.

@ryzokuken
Copy link

I think strongly that Temporal should stick to the standards here. Format-transforms are a useful use-case for third-party modules.

@CarterLi
Copy link

CarterLi commented Aug 6, 2021

I think strongly that Temporal should stick to the standards here. Format-transforms are a useful use-case for third-party modules.

It IS the standard: https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table

3rd-party libraries like moment, luxon, dayjs, date-fns are already very easy to use and their APIs are familiar. People choose to a new standard object that is not very stable, has learning costs and browser compatibility concerns mainly because it can reduce 3rd party library dependencies and publication code sizes.

If people still have to use 3rd-party libs for such a common use case formatting string, why do they have to learn something completely different?

@ryzokuken
Copy link

ryzokuken commented Aug 9, 2021

@CarterLi I disagree. First of all, LDML is not a timestamp format standard, it is meant for localized formatting for i18n.

We already allow localized formatting for Temporal objects via Intl.DateTimeFormat, so any such formatting can (and should) only be done using Intl.DateTimeFormat via Temporal.Foo#toLocaleString. To a large extent, Intl.DateTimeFormat already allows customized formats like these, with the added benefit of making them locale-friendly.

Anything that's not "canonical timestamp format for interchange" and "localized format for display" should be done using third-party modules.

@chrisveness
Copy link

I have a requirement to format some dates as either ‘8 Sep’ or ‘8 Sep 2020‘ (depending on whether the year is the current year or not).

toLocaleString('en') appears to insist that I want the month before the day, and the year separated by a comma – but I don't!

Is there a way to achieve those formats with the current APIs available?

If not, current APIs seem too rigid, and I would like to see custom formatting using tokens.

It has been an issue in the past that there have only been de-facto standards for such tokens, and it would be problematic to adopt any of these on a presumably arbitrary basis.

However, the Unicode Consortium is a standards body, and as part of the Locale Data Markup Language (LDML) have defined a standard set of pattern fields for date formatting (https://unicode-org.github.io/cldr/ldml/tr35-dates.html#Date_Format_Patterns).

As LDML is intended explicitly for locale-specific applications, and includes references to non-Gregorian calendars (such as the Chinese lunar calendar), it looks as if it would make a good fit for the Temporal proposal.

I hope I don‘t migrate to Temporal, and then still have to load the whole of a third-party module such as moment just to be able to generate custom formats.

Surely Unicode TR35 Part 4 (Dates) Section 8 (Date Format Patterns) would make a good basis for Temporal custom formatting?

@ptomato
Copy link
Collaborator Author

ptomato commented Sep 14, 2021

The discussion thread starting from tc39/ecma402#554 (comment) might prove illuminating as to why there is caution about a formatter API.

I'm not sure whether your requirement

toLocaleString('en') appears to insist that I want the month before the day, and the year separated by a comma – but I don't!

means that you want to format dates with the day before the month and no comma, even in the en locale where that would be "incorrect" (at least, according to CLDR); or it means that you do actually want an internationalized date format but you are looking for the en-GB locale (or another en-?? locale, but I guessed GB based on the location in your GitHub profile).

In any case Temporal.Now.plainDateISO().toLocaleString('en-GB', { dateStyle: 'medium' }) seems to work for your purpose?

I hope I don‘t migrate to Temporal, and then still have to load the whole of a third-party module such as moment just to be able to generate custom formats.

It's more likely that a third-party module specifically for custom formatting would spring up, rather than loading Moment.

@craigmiller160
Copy link

I can't believe this is even a debate. Formatting strings based on user-supplied specifiers is a standard across a wide range of languages and libraries. Sticking to only the "standards" is the kind of decision that gets made in ivory tower rooms of computer theory and completely ignores the needs of programmers dealing with the real, messy world that we have to code for.

@ptomato
Copy link
Collaborator Author

ptomato commented Feb 4, 2022

I mean... this is a repo for incubating a proposal for the standard, and there are other programmers in this messy world that have needs that would conflict with just copying the API from date-fns ... so I guess I don't understand why it's so unbelievable that there is a discussion about this? In any case, it would be helpful if you could provide some details about your use case. The more we know about how people would use this API, the better.

@Ariane-B
Copy link

Ariane-B commented Aug 5, 2022

Today, I wanted to get the following date/time format:

French: 2022-12-30 à 20 h 22 OR 2022-12-30 à 20:22

(20:22 isn't technically the format for use in sentences in French, but it's widely accepted as a more compact, more practical format to use for displaying times, compared to 20 h 22. Our design team isn't 100 % sure which format they want at this point.)

English: 2022-12-30 at 20:22

I couldn't find a way to do that with the current toLocaleString or DateTimeFormat options on offer.

Maybe there's an obscure locale that allows it, but the lack of a full documentation of all supported locales and the formats they provide anywhere that I could find isn't making the search easy.

Anyway, with this, I was able to get pretty close:

myDate.toLocaleString('fr-CA', {
  year: 'numeric',
  month: '2-digit',
  day: '2-digit',
  hour: 'numeric',
  minute: '2-digit',
})

// This is the same
myDate.toLocaleString('fr-CA', {
  dateStyle: 'short',
  timeStyle: 'short',
})

// Output: 2022-08-05 13 h 06

But the only way to get a separator in there that I found was:

const date =  new Date()
const formattedDate = new Date().toLocaleString('fr-CA', {
  dateStyle: 'short',
})
const formattedTime = new Date().toLocaleString('fr-CA', {
  timeStyle: 'short',
})
const formattedDateTime = `${ formattedDate } à ${ formattedTime }`

I do have a string translation library with variable insertion that allows me to do the English version of this, adding "at" instead of "à", pretty well, but isn't it unfortunate that I had to do that?

Basically, we have locales that define the "visuals" of the format. The order of the numbers. The separators between the numbers. The separators between hours and minutes. The separators between the date and time.

All of these factors can't be overridden. It's basically just magic that happens behind the scenes that we have no power over. I really, really wish I could pick fr-CA's defaults and work from there instead of having to either take all of its magic or none of it.

Additionally, I'm lucky; I'm from Canada, and the ISO-8601 representation of a date (just the 2022-12-30 part; not the whole machine-friendly string) is what JS understands as a short date for me. But what if someone wants that, but they're not aware of locale en-CA? The built-in formatter is of no help to them.

One more point: What do I do if I'm writing a French date but I want the colon representation of the time (20:22 instead of 20 h 22)? Should I just... pretend the locale is English?

And lastly: what if people want to put the date in bold? Should Temporal help with wrapping stuff in HTML tags/markdown markers or not? Because even if you offer separator handling, if people want to format the date differently, they kinda have to do string concatenation anyway, unless there's some way to wrap things in offer.

@ptomato
Copy link
Collaborator Author

ptomato commented Aug 5, 2022

Thanks for the detailed use cases! I agree, these would be good use cases for a string format minilanguage.

The point about starting from localized defaults and tweaking a few things is, I fear, difficult to write code for in a general case, so it's probably best to consider it for one or a few locales, like fr-CA and en-CA in your case.

You can pick fr-CA's defaults and work from there, using formatToParts(), which gives you the same localized string as toLocaleString() but broken up into segments that are tagged with the type of data that they represent. It probably gives you more information than you need and so your code using it would be on the verbose side, but could be encapsulated. For example, doing some of the stuff you mentioned:

function formatCustom(plainDateTime) {
  const formatter = new Intl.DateTimeFormat('fr-CA', {
    dateStyle: 'short',
    timeStyle: 'short'
  });

  const { year, month, day, hour, minute } = formatter
    .formatToParts(plainDateTime)
    .reduce((result, { type, value }) => {
      if (type !== 'literal') {
        result[type] = value;
      }
      return result;
    }, {});

  return `${year}-${month}-${day} à <strong>${hour}:${minute}</strong>`;
}

It would be easier to do things like this if formatToParts() gave more information about the purpose of each separator that it splits out, which is an issue that you could raise at https://github.com/tc39/ecma-402.

I do think that HTML and Markdown are out of scope for a facility that's part of the core JS language. Early JS included special HTML facilities like "my string".bold() that wrapped strings in HTML tags but those have been discouraged for a long time. In my opinion, a good formatting API should allow this but not give it special consideration.

@Ariane-B
Copy link

Ariane-B commented Aug 5, 2022

What if we could do something like this?

const  localeDateFormatParams = Intl.getDateFormatParamsForLocale('fr-CA');
console.log(localeDateFormatParams);

// Would output something like:
const localeDateFormatParams = {
  separators: {
    afterDay: '-',
    afterMonth: '-',
    afterDate: ' à ',
  },
  // And much much more info about the minute details of how this locale works
}

And then use it like:

// An object for a locale would mean "I'm about to define to you the slightest details of a locale that may or may not exist"
const formatter = new Intl.DateTimeFormat(localeDateFormatParams, {
  dateStyle: 'short',
  timeStyle: 'short'
})

As for your code example, the existence of formatToParts is interesting and I didn't know about it, but wouldn't it be simpler to simply format the date and time separately in fr-CA or en-CA or a combination of both? Way fewer lines.

@ptomato
Copy link
Collaborator Author

ptomato commented Aug 5, 2022

It would be fewer lines to use dateStyle and timeStyle separately if you only need to change the date/time separator string, but I intentionally did it the long way to illustrate what you might do to replace the h separator with a colon as well.

ECMA-402 have previously indicated that they are skeptical about such changes to Intl.DateTimeFormat: tc39/ecma402#554 (comment) There is a data-driven API being discussed at tc39/ecma402#210 that I think would address this.

@Ariane-B
Copy link

Ariane-B commented Aug 5, 2022

Oh, I see! Thanks. :)

My sneaky way out of the colon issue was to use en-CA for the time no matter what language the page is being displayed in lol

@solendil
Copy link

solendil commented Nov 21, 2022

It seems to me there is a confusion between localization (for human beings) and formatting (for other computing uses).

The Intl API might be suitable for localization; but is definitely not appropriate for formatting. Most of my time is actually spent interacting with non-standard systems and providing strings like 202211210000, 00H12H, 301022. It's a breeze with standard "format" APIs provided with all major date and time libraries. But it's a really sore point pain with Temporal and Intl :(

@leonyu
Copy link

leonyu commented Jan 2, 2023

I think the problem with string formatting is the concept and idioms themselves are very English-centric.

For example, most implementation assumes

  • MMMM to be January and MMM to be Jan
  • DDDD to be Monday and DDD to be Mon

However, this does not even translate to other Latin alphabet languages, for example in French, the equivalent of 3-letter month is 4 letters -- janv. for Jan and févr. for Feb. Does that mean we need to use MMMM for short month in French? And maybe MMMMM for long form? (Ref).

Also how does the string format apply to languages where people do not use space separators? (e.g. Asian languages) Does MMMM mean two 2-letter MM? (e.g. MM+MM) or does it mean 4-letter MMMM?

@Ariane-B
Copy link

Ariane-B commented Jan 2, 2023

We just need to find the language with the most different ways to express, say, a month, and prepare a scale of complexity based on that. After that, it's probably relatively trivial to just simplify for the simpler languages. The concept of a short month doesn't exist in your language at all? Great! When you ask for a short month, you get the only month that exists.

@leonyu
Copy link

leonyu commented Jan 2, 2023

OK, in that case, let's say the hypothetical "most different" language is some French dialect locale and their 4-letter word abbreviation for Month is represented by MMMM. The 1-3 letter spots (M, MM, MMM) are occupied by different numeric month representations in their locale (all in popular usage).

Since as you say just "simplify for the simpler languages" and English is simpler, English would get:

  • M, MM, MMM -- numeric month
  • MMMM -- Jan, Feb, Mar..
  • MMMMM -- January, February...

Would you be happy with that? I am pretty sure not, since that is obviously illogical for English speakers.

@Ariane-B
Copy link

Ariane-B commented Jan 2, 2023

I'd be satisfied with that. It may not be the ideal solution, but it'd be a workable solution.

A workable solution is much better than "let's not have a solution because there is no absolutely perfect one".

Though "MMMM" and such isn't the only model in existence. We could, for example, do templates with keywords that would be a bit more explicit than repeated letters.

This is just me improvising, but keywords like ${ monthTwoLetters } would be hard to misunderstand. Or maybe we could take a page out of PHP's book with percent signs and lowercase/uppercase letters. Not explicit and will require frequent referring to the docs, but short.

There are lots of possibilities. We should strive to choose the one(s) that will apply best to the most use cases. But even if there isn't a solution that will perfectly match every use case, I think it'll still be a significant win if we can cover most of them.

@pirate
Copy link

pirate commented Mar 8, 2024

I think the problem with string formatting is the concept and idioms themselves are very English-centric.

English is however the de-facto standard in programming / developer-facing strings, and that's not going to be changed by one datetime library. Introducing location-dependent variation in machine-readable dates would be surprising even to non-English speaking devs, lets follow the principle of least surprise here!

The primary use-case for strftime is not producing human-readable localized strings, but rather producing consistent machine-readable/developer-facing strings for things like log files, console output, model fields, filenames, URLs, DSLs, etc., where ISO-8601 is not suitable and the localization of toLocaleString/DateTimeFormat actually hinder consistency.

As wonderful as they are, ISO-8601 and RFC 3339 do not suit all use-cases, there are reasons to need things like these:

  • date +"%Y-%m-%d_%H-%M-%S_%s" -> 2024-03-08_12-12-50_1709928770 (for filenames)
  • date '+%Y:%s%5N' -> 2024:170992885283002 for a year + unix timestamp with specific precision (e.g. for a yearly report file where a human needs to be able to see what year it applies to, but the exact time should be only machine-readable ts format to avoid implying it's a monthly/daily report to the human)
  • date +"%m/%d/%H/%M:%S" -> 03/08/12/17:09 for assembling a URL where only month, day, and hour are folders but the minute+second entries are a file
  • for preparing dates for DSLs (e.g. SQL, JQL, GQL, Excel formulas, config files like INI/YAML, etc. can all store and expect a wide variety of delimiters & formats in real world applications)
  • for debugging output like year: %Y month: %m day: %d (OFFSET=%z) where you want to interleave strings or print any of the other details offered by https://strftime.org/ in a specific format to console
  • etc. just think of more constraints in real life around special characters, whitespace, implied precision, etc. and you can quickly find situations where machine readable timestamps are vastly easier to format in strftime-style than trying to shoehorn Intl.DateTimeFormat or manually slice and re-assemble ISO-8601.

Of course you can technically achieve these with some library-free hand-written string slicing and assembling code right now: tc39/proposal-temporal#2501 (comment) but the whole point of strftime and yy:mm:dd equivalents are to remove these incredibly common, often-repeated, often subtly broken hand-written string formatters across the world's codebases.

There's a reason strftime/TR35 are so popular already, and we cannot honestly say that all of their existing use-cases are mis-uses (even if we're trying to push people towards localizing all datetimes by default). I think Python's datetime library strikes a great balance here by offering both strftime and DateTimeFormat-style localization options.

strftime/TR35/etc. are so universal that I struggle to find a language that doesn't provide a standard library equivalent:

It doesn't need to be the recommended or primary way of producing date strings, but because it's such an industry standard it would remove the need for external packages just to access this feature. As it stands now, strftime is one of the main reasons why people install external JS libraries like moment.js, date-fns, etc. and date formatting is so needed in JS that it dominates the Google search suggestions for Javascript format ....