We discuss language identification of noisy, romanized text - an un-addressed but critical problem in Indic text mining, and release a language-identification utility. We then measure geographic extents of language use in India. Summary of a WNUT 2020 paper.
Third place in the Richmond carvers show for this maple panel.
A new piece carved in maple during this lockdown period.
From the COVID–19 shutdown, 2 projects - a coffee scoop in cherry and a spatula in jatoba.
Polyglot word embeddings obtained by training a skipgram model on a multi-lingual corpus discover extremely high-quality language clusters.
These can be trivially retrieved using an algorithm like $k-$Means giving us a fully unsupervised language identification system.
We have successfully used this technique in many situations involving several low-resource languages that are poorly supported by popular open source models.
This blog post covers methods, intuition, and links to an implementation based on 100-dimensional FastText embeddings.
The 2019 portfolio:
- Paisley hairpin
- Rose in maple
- Orchid corsage
- Paisley in maple
- Lily in basswood
- Mapley paisley and rest
- Foliage in cypress
- Mahogany rest
- Rococo acanthus in basswood
Do read it and let me know what you feel. I am also soliciting ideas for collaborations and that sort of thing.
I just finished rendering the Gridded Population of the World (GPW) dataset. I chose the cyan on black color theme since that is a staple of all my cartography projects. Some insights that people pointed out:
Maple, aged mahogany.
We just finished this basswood lily for a client. Simple, subtle, and full of life. A glorious amber shine envelops it. A perfect gift for a soulmate.