Microsoft has released a series of new machine-learning APIs from its AI-focused Project Oxford. There are four APIs in all, based on faces, speech, vision, and language. There are technical demonstrations available for them all, but Microsoft’s publicly accessible (and fun) How-Old.net demo site — which shows some of the APIs in action — went viral shortly after the announcement.
The new APIs are designed to be integrated into both websites and mobile apps, and Microsoft has produced SDKs for Android and .NET, plus an iOS SDK for the speech API. The APIs are based on REST, and can be used by most platforms with an HTTP library and a data connection.
Project Oxford’s AI-focused APIs include those related to vision and speech
Here’s what Microsoft released:
- Speech: Provides speech to text conversion from a streaming audio source, plus it can recognise spoken commands and text to speech conversions. See the demo here.
- Language: Closely linked to the Speech API, this allows speech to be used contextually, particularly in relation to IoT devices or wearables, where voice commands are more useful. Microsoft already provides contextual voice commands in its Cortana virtual assistant software. Learn more about it here.
- Face: As used in How-Old.net, this API can identify faces, guess a person’s age, verify if two faces are the same person in different images, and provide a search system based on face recognition. See the demo here.
- Vision: Used for moderation purposes, the Vision API can recognise images with adult content, or featuring abuse. Alternatively, it can generate ideal thumbnails from larger images, or use OCR technology to extract text. See the demo here.
Microsoft’s Project Oxford APIs are available as a public beta, but only to those with an Azure cloud account, and each one has a restricted amount of transactions. The final pricing for use of the APIs has yet to be announced.