HarfBuzz does opentype shaping, that is, transforming strings of unicode characters to lists of glyphs with positioning. The significance of this can be hard to understand for someone used to the latin script, as that needs very little shaping - kerning is often the only thing that’s absolutely necessary.
But in complex scripts, most notably the Indic, there’s a lot going on. Unicode characters can merge into one glyph under circumstances, or one character can split into several, and relative positioning in both the x and y axis is imperative.
A reason that OpenType shaping is complex is that part of the rules for what to do will be found in the font, and part will need to be hard-coded in the code implementing it.
If you’re going to roll your own text renderer, you’ll have to care about the following areas:
Rasterization/rendering to bitmaps, including hinting (notoriously difficult, old-style TrueType hinting instructions are bytecode, so you’ll be writing a tiny VM for this)
Shaping (Kerning at a minimum, full OpenType shaping for international support)
BiDi (for full international support, primarily Hebrew and Perso-Arabic)
A caching system for rendered text glyphs and shaped text runa, as it will be too slow to perform this each time you want to render some text
Let’s just say that I do not recommend going this route unless you’re prepared to spend a lot of time on it.
I’ve got all that. I just needed to convert a string of characters into a list of glyph IDs.
For context, I’m doing a code editor.
I don’t use harfbuzz for shaping or whatever, since I planned on rendering single lines of mono spaced text. I can do everything except string->glyphs conversion.
Just trying to implement basic features such as ligatures is incredibly hard, since there’s almost no documentation. Therefore you can’t make assumptions that are necessary to take shortcuts and make optimizations. I don’t know if harfbuzz uses a source of documentation that I haven’t been able to find, or maybe they are just way smarter than me, or if fonts are made in a way that they work with harfbuzz instead of the other way around.
As someone trying to have as little dependencies as possible, it is a struggle. But at the same time, harfbuzz saved me soo much time.
EDIT:
I don’t do my own glyph rasterization, but that’s because I haven’t gotten to it yet, so I do use a library. I don’t know if it’s going to be harder than string->glyphs, but I doubt so.
HarfBuzz does opentype shaping, that is, transforming strings of unicode characters to lists of glyphs with positioning. The significance of this can be hard to understand for someone used to the latin script, as that needs very little shaping - kerning is often the only thing that’s absolutely necessary.
But in complex scripts, most notably the Indic, there’s a lot going on. Unicode characters can merge into one glyph under circumstances, or one character can split into several, and relative positioning in both the x and y axis is imperative.
A reason that OpenType shaping is complex is that part of the rules for what to do will be found in the font, and part will need to be hard-coded in the code implementing it.
If you’re going to roll your own text renderer, you’ll have to care about the following areas:
Let’s just say that I do not recommend going this route unless you’re prepared to spend a lot of time on it.
I’ve got all that. I just needed to convert a string of characters into a list of glyph IDs.
For context, I’m doing a code editor.
I don’t use harfbuzz for shaping or whatever, since I planned on rendering single lines of mono spaced text. I can do everything except string->glyphs conversion.
Just trying to implement basic features such as ligatures is incredibly hard, since there’s almost no documentation. Therefore you can’t make assumptions that are necessary to take shortcuts and make optimizations. I don’t know if harfbuzz uses a source of documentation that I haven’t been able to find, or maybe they are just way smarter than me, or if fonts are made in a way that they work with harfbuzz instead of the other way around.
As someone trying to have as little dependencies as possible, it is a struggle. But at the same time, harfbuzz saved me soo much time.
EDIT: I don’t do my own glyph rasterization, but that’s because I haven’t gotten to it yet, so I do use a library. I don’t know if it’s going to be harder than string->glyphs, but I doubt so.
It would make sense that a code editor could use a more limited subset of text rendering that could be more optimized.
Perhaps a bit surprisingly, Microsoft actually has pretty good documentation on OpenType. Here’s info on what shaping applies to “standard” scripts:
https://learn.microsoft.com/en-us/typography/script-development/standard
And here’s the landing page for the latest OpenType spec:
https://learn.microsoft.com/en-us/typography/opentype/spec/
Specifically for ligatures, you’re looking for the liga feature which is specified in the font’s GSUB table.