Recognize your users' handwriting
The Handwriting Recognition API allows you to recognize text from handwritten input as it happens.
What is the Handwriting Recognition API?
The Handwriting Recognition API allows you to convert handwriting (ink) from your users into text. Some operating systems have long included such APIs, and with this new capability, your web apps can finally use this functionality. The conversion takes place directly on the user's device, works even in offline mode, all without adding any third-party libraries or services.
This API implements so-called "on-line" or near real-time recognition. This means that the handwritten input is recognized while the user is drawing it by capturing and analyzing the single strokes. In contrast to "off-line" procedures such as Optical Character Recognition (OCR), where only the end product is known, on-line algorithms can provide a higher level of accuracy due to additional signals like the temporal sequence and pressure of individual ink strokes.
Suggested use cases for the Handwriting Recognition API
Example uses include:
- Note-taking applications where users want to capture handwritten notes and have them translated into text.
- Forms applications where users can use pen or finger input due to time constraints.
- Games that require filling in letters or numbers, such as crosswords, hangman, or sudoku.
Current status
The Handwriting Recognition API is available from (Chromium 99).
How to use the Handwriting Recognition API
Feature detection
Detect browser support by checking for the existence of the createHandwritingRecognizer()
method on the navigator object:
if ('createHandwritingRecognizer' in navigator) {
// 🎉 The Handwriting Recognition API is supported!
}
Core concepts
The Handwriting Recognition API converts handwritten input into text, regardless of the input method (mouse, touch, pen). The API has four main entities:
- A point represents where the pointer was at a particular time.
- A stroke consists of one or more points. The recording of a stroke starts when the user puts the pointer down (i.e., clicks the primary mouse button, or touches the screen with their pen or finger) and ends when they raise the pointer back up.
- A drawing consists of one or more strokes. The actual recognition takes place at this level.
- The recognizer is configured with the expected input language. It is used to create an instance of a drawing with the recognizer configuration applied.
These concepts are implemented as specific interfaces and dictionaries, which I'll cover shortly.
Creating a recognizer
To recognize text from handwritten input, you need to obtain an instance of a HandwritingRecognizer
by calling navigator.createHandwritingRecognizer()
and passing constraints to it. Constraints determine the handwriting recognition model that should be used. Currently, you can specify a list of languages in order of preference:
const recognizer = await navigator.createHandwritingRecognizer({
languages: ['en'],
});
The current implementation on ChromeOS can only recognize one language at a time. It only supports English (en
), and a gesture model (zxx-x-gesture
) to recognize gestures such as crossing out words.
The method returns a promise resolving with an instance of a HandwritingRecognizer
when the browser can fulfill your request. Otherwise, it will reject the promise with an error, and handwriting recognition will not be available. For this reason, you may want to query the recognizer's support for particular recognition features first.
Querying recognizer support
By calling navigator.queryHandwritingRecognizerSupport()
, you can check if the target platform supports the handwriting recognition features you intend to use. In the following example, the developer:
- wants to detect texts in English
- get alternative, less likely predictions when available
- gain access to the segmentation result, i.e., the recognized characters, including the points and strokes that make them up
const { languages, alternatives, segmentationResults } =
await navigator.queryHandwritingRecognizerSupport({
languages: ['en'],
alternatives: true,
segmentationResult: true,
});
console.log(languages); // true or false
console.log(alternatives); // true or false
console.log(segmentationResult); // true or false
The method returns a promise resolving with a result object. If the browser supports the feature specified by the developer, its value will be set to true
. Otherwise, it will be set to false
. You can use this information to enable or disable certain features within your application, or to adjust your query and send a new one.
Due to fingerprinting concerns, you cannot request a list of supported features, such as particular languages, and the browser may ask for user permission or reject your request entirely if you send too many feature queries.
Start a drawing
Within your application, you should offer an input area where the user makes their handwritten entries. For performance reasons, it is recommended to implement this with the help of a canvas object. The exact implementation of this part is out of scope for this article, but you may refer to the demo to see how it can be done.
To start a new drawing, call the startDrawing()
method on the recognizer. This method takes an object containing different hints to fine-tune the recognition algorithm. All hints are optional:
- The kind of text being entered: text, email addresses, numbers, or an individual character (
recognitionType
) - The type of input device: mouse, touch, or pen input (
inputType
) - The preceding text (
textContext
) - The number of less-likely alternative predictions that should be returned (
alternatives
) - A list of user-identifiable characters ("graphemes") the user will most likely enter (
graphemeSet
)
The Handwriting Recognition API plays well with Pointer Events which provide an abstract interface to consume input from any pointing device. The pointer event arguments contain the type of pointer being used. This means you can use pointer events to determine the input type automatically. In the following example, the drawing for handwriting recognition is automatically created on the first occurrence of a pointerdown
event on the handwriting area. As the pointerType
may be empty or set to a proprietary value, I introduced a consistency check to make sure only supported values are set for the drawing's input type.
let drawing;
let activeStroke;
canvas.addEventListener('pointerdown', (event) => {
if (!drawing) {
drawing = recognizer.startDrawing({
recognitionType: 'text', // email, number, per-character
inputType: ['mouse', 'touch', 'pen'].find((type) => type === event.pointerType),
textContext: 'Hello, ',
alternatives: 2,
graphemeSet: ['f', 'i', 'z', 'b', 'u'], // for a fizz buzz entry form
});
}
startStroke(event);
});
The current implementation on ChromeOS does not support grapheme sets yet, they are silently ignored.
Add a stroke
The pointerdown
event is also the right place to start a new stroke. To do so, create a new instance of HandwritingStroke
. Also, you should store the current time as a point of reference for the subsequent points added to it:
function startStroke(event) {
activeStroke = {
stroke: new HandwritingStroke(),
startTime: Date.now(),
};
addPoint(event);
}
Add a point
After creating the stroke, you should directly add the first point to it. As you will add more points later on, it makes sense to implement the point creation logic in a separate method. In the following example, the addPoint()
method calculates the elapsed time from the reference timestamp. The temporal information is optional, but can improve recognition quality. Then, it reads the X and Y coordinates from the pointer event and adds the point to the current stroke.
function addPoint(event) {
const timeElapsed = Date.now() - activeStroke.startTime;
activeStroke.stroke.addPoint({
x: event.offsetX,
y: event.offsetY,
t: timeElapsed,
});
}
The pointermove
event handler is called when the pointer is moved across the screen. Those points need to be added to the stroke as well. The event can also be raised if the pointer is not in a "down" state, for example when moving the cursor across the screen without pressing the mouse button. The event handler from the following example checks if an active stroke exists, and adds the new point to it.
canvas.addEventListener('pointermove', (event) => {
if (activeStroke) {
addPoint(event);
}
});
Recognize text
When the user lifts the pointer again, you can add the stroke to your drawing by calling its addStroke()
method. The following example also resets the activeStroke
, so the pointermove
handler will not add points to the completed stroke.
If necessary, you can also use the drawing's getStrokes()
method to list all strokes, and the removeStroke()
method to remove a particular one from the drawing.
Next, it's time for recognizing the user's input by calling the getPrediction()
method on the drawing. Recognition usually takes less than a few hundred milliseconds, so you can repeatedly run predictions if needed. The following example runs a new prediction after each completed stroke.
canvas.addEventListener('pointerup', async (event) => {
drawing.addStroke(activeStroke.stroke);
activeStroke = null;
const [mostLikelyPrediction, ...lessLikelyAlternatives] = await drawing.getPrediction();
if (mostLikelyPrediction) {
console.log(mostLikelyPrediction.text);
}
lessLikelyAlternatives?.forEach((alternative) => console.log(alternative.text));
});
This method returns a promise which resolves with an array of predictions ordered by their likelihood. The number of elements depends on the value you passed to the alternatives
hint. You could use this array to present the user with a choice of possible matches, and have them select an option. Alternatively, you can simply go with the most likely prediction, which is what I do in the example.
The prediction object contains the recognized text and an optional segmentation result, which I will discuss in the following section.
Detailed insights with segmentation results
If supported by the target platform, the prediction object can also contain a segmentation result. This is an array containing all recognized handwriting segment, a combination of the recognized user-identifiable character (grapheme
) along with its position in the recognized text (beginIndex
, endIndex
), and the strokes and points that created it.
if (mostLikelyPrediction.segmentationResult) {
mostLikelyPrediction.segmentationResult.forEach(
({ grapheme, beginIndex, endIndex, drawingSegments }) => {
console.log(grapheme, beginIndex, endIndex);
drawingSegments.forEach(({ strokeIndex, beginPointIndex, endPointIndex }) => {
console.log(strokeIndex, beginPointIndex, endPointIndex);
});
},
);
}
You could use this information to track down the recognized graphemes on the canvas again.
Complete recognition
After the recognition has completed, you can free resources by calling the clear()
method on the HandwritingDrawing
, and the finish()
method on the HandwritingRecognizer
:
drawing.clear();
recognizer.finish();
Demo
The web component <handwriting-textarea>
implements a progressively enhanced, editing control capable of handwriting recognition. By clicking the button in the lower right corner of the editing control, you activate the drawing mode. When you complete the drawing, the web component will automatically start the recognition and add the recognized text back to the editing control. If the Handwriting Recognition API is not supported at all, or the platform doesn't support the requested features, the edit button will be hidden. But the basic editing control remains usable as a <textarea>
.
The web component offers properties and attributes to define the recognition behavior from the outside, including languages
and recognitiontype
. You can set the content of the control via the value
attribute:
<handwriting-textarea languages="en" recognitiontype="text" value="Hello"></handwriting-textarea>
To be informed about any changes to the value, you can listen to the input
event.
You can try the component using this demo on Glitch. Also be sure to have a look at the source code. To use the control in your application, obtain it from npm.
Security and permissions
The Chromium team designed and implemented the Handwriting Recognition API using the core principles defined in Controlling Access to Powerful Web Platform Features, including user control, transparency, and ergonomics.
User control
The Handwriting Recognition API can't be turned off by the user. It is only available for websites delivered via HTTPS, and may only be called from the top-level browsing context.
Transparency
There is no indication if handwriting recognition is active. To prevent fingerprinting, the browser implements countermeasures, such as displaying a permission prompt to the user when it detects possible abuse.
Permission persistence
The Handwriting Recognition API currently does not show any permissions prompts. Thus, permission does not need to be persisted in any way.
Feedback
The Chromium team wants to hear about your experiences with the Handwriting Recognition API.
Tell us about the API design
Is there something about the API that doesn't work like you expected? Or are there missing methods or properties that you need to implement your idea? Have a question or comment on the security model? File a spec issue on the corresponding GitHub repo, or add your thoughts to an existing issue.
Report a problem with the implementation
Did you find a bug with Chromium's implementation? Or is the implementation different from the spec? File a bug at new.crbug.com. Be sure to include as much detail as you can, simple instructions for reproducing, and enter Blink>Handwriting
in the Components box. Glitch works great for sharing quick and easy repros.
Show support for the API
Are you planning to use the Handwriting Recognition API? Your public support helps the Chromium team prioritize features and shows other browser vendors how critical it is to support them.
Share how you plan to use it on the WICG Discourse thread. Send a tweet to @ChromiumDev using the hashtag #HandwritingRecognition
and let us know where and how you're using it.
Helpful Links
- Explainer
- Spec draft
- GitHub repo
- ChromeStatus
- Chromium bug
- TAG review
- Intent to Prototype
- WebKit-Dev thread
- Mozilla standards position
Acknowledgements
This article was reviewed by Joe Medley, Honglin Yu and Jiewei Qian. Hero image by Samir Bouaked on Unsplash.