Polyglot Word Embeddings Discover Language Clusters

Polyglot word embeddings obtained by training a skipgram model on a multi-lingual corpus discover extremely high-quality language clusters.

These can be trivially retrieved using an algorithm like $k-$Means giving us a fully unsupervised language identification system.

Experiments show that these clusters are on-par with results produced by popular open source (FastText LangID) and commercial models (Google Cloud Translation).

We have successfully used this technique in many situations involving several low-resource languages that are poorly supported by popular open source models.

This blog post covers methods, intuition, and links to an implementation based on 100-dimensional FastText embeddings.


Paisley Sculpture 1

A new piece - a paisley sculpture carved in relief out of maple and mahogany.

The mahogany pedestal features rosettes and small chip-carved patterns incorporating incisions. The paisley piece rests in a bed of felt.

Here is a small sampling of the detail work on the pedestal.

And a small video of the carving process.



Twitter: @shriphani
Instagram: @life_of_ess
Fortior Per Mentem
(c) Shriphani Palakodety 2013-2020