The Automated Transcription for Indian Music (AUTRIM) Project by NCPA and UvA


Primary aim of the project and legal statement

The aim of this project is to freely put at the disposal of students, teachers and researchers of music a new tool, that enables us to take a zoomed-in look-and-listen of North Indian music (Hindustani sangit). Because of the generous grants offered by Sir Dorabji Tata Trust from 2004-2007 and 2009-2012, and decades of support from NCPA and Amsterdam University this material is being offered in the public domain. The artists that have collaborated in this program are all of the highest calibre and it goes without saying that all the material on this website is copyright protected. None of it may be copied, distributed, used in publications or any other form of divulging it without prior permission of the co-ordinator of the project, Dr. Suvarnalata Rao. In the event of unauthorised use of the materials of this website appropriate action will be taken. The project was instigated by the former chairman of NCPA, the late Dr.J.J.Bhabha. D.B.Biswas was the director of the AUTRIM project at NCPA, while Dr. Suvarnalata Rao (NCPA) and Dr. Wim van der Meer (University of Amsterdam) have done the research, the programming, the implementation and much legwork. The final assembly in the form of videos has been done by Mumbai based filmmaker Rustom Irani and his colleague, Salil P. Kawli.

We want to thank all the persons that have supported us in the project, giving advice, feedback and criticism.

History of the project

In the early 1980s a group of young men, Bernard Bel, the late James Arnold and Joep Bor founded the International Society for Traditional Arts Research (ISTAR), based in New Delhi. Soon  one of the present authors, Wim van der Meer, joined the team. In 1983 they met Dr.J.J.Bhabha of the National Centre for the Performing Arts (NCPA, Mumbai) who had a dream. This dream was to develop a system of notation that would be specifically fit to describe, analyse and even reproduce Indian music with all its fine nuances and inflections. And one of the young men of ISTAR, Bernard Bel, being a IT engineer, came up with a possible solution – that was to become known as the Melodic Movement Analyser (MMA). With the help of a Ford Foundation grant and the support of NCPA a research lab was built at NCPA itself, using a combination of AD converters, filters and other hardware (ISTAR Newsletter 3-4, 1984-85, p. 54-59) that was hooked up to an Apple II and later III computer. This system has been refined over the decades as computers became faster and pitch perception models more advanced. The Apple system was still in use in the early 1990s when Suvarnalata Rao, the other author of this website, generated graphs for her PhD work on intonation and cognition of ras (1993). In the 1990s Wim van der Meer ported the whole implementation to the original Macintosh, which had a built-in AD converter and enough processing power to use a pitch perception model developed at the University of Leiden and implemented in their LVS software that was running on microVax computers. Later still, the software for phonetic research developed at Amsterdam University, PRAAT, was used for making graphs that could look like this (if you want to try PRAAT yourself download it and study the manual for musicologists):

Fig. 1: PRAAT representation of a fragment of Faiyaz Khan’s alap in Lalit.

Of course, this kind of graphical representation has a long history, with Metfessel’s and Seeger’s work being the best known:

Fig. 2: Metfessel, Phonophotography (1928)

Fig. 3: Seeger’s Melograph (developed between 1949 and 1953).

The Music in Motion Solution

All these software solutions were aimed at static graphical representation of melodic music. In fact, much in the same way a staff notation is a graphical representation of music. And we worked on systems of turning the graphs into sargam-s, numerical codes or staff notation. In fact it isn’t all that difficult to do that by computer and Wim van der Meer wrote a program MeloScribe in the 1990s that did just that: take a graph and turn it into a traditional notation.

At the same time we were experimenting with ways of synchronizing music and graphs so that we could hear what we were seeing and vice versa.

And then it suddenly hit us! This Music in Motion was really what we were looking for. We had already been showing movies assembled in flash at several conferences and people were extremely enthusiastic about them. Musicians themselves were often very pleased to see their own music in this way. We came to realise that to work with some adapted system of notation was simply archaic. Of course, before the days of sound recording and pitch extraction, notation was the only tool we had. But we all know how flawed it is, or, as Nicolas Cook put it, “Notation conserves music, then, but it conceals as much as it reveals” (1998: 55).

In 2010 Rustom Irani joined the AUTRIM team and with his input we came to a final definition of the look and feel of Music in Motion. After a brief trial and error phase where he experimented with different styles of graphics and video effects Rustom arrived at a solution agreed upon by everyone in the team. The important change that he implemented was that now indeed the music (or its graphical form in this case) was moving in progressive sync with the audio, while in our earlier system the graphs were static and only the cursor was moving to show the current position in the musical stream.

Explanation of the graphics of the movies

graph explanation

The central image moves from right to left, like a flowing river under the cursor, which is the current position in the sound file.The left portion of the pitch contour shown here (at 26.7 sec) is dotted, which indicates the pitch change is very fast. It should be noted that such fast movements do not represent fully accurate pitch measurements – this is both a technical and a cognitive issue – in fast movements we can neither perceive the pitch fully accurately, nor can the computer trace it accurately. The grey background scale positions are always at multiples of 100 cents, which means sa = 0, r = 100, R = 200, g = 300 etc. It has been asked why we do not put the lines at the ‘proper’ pitch positions for the raga (for instance G in Khammaj would be at 386 cents). There are two main reasons for this: (1) a measuring stick should always be a single unit (a carpenter doesn’t use meters of different length) and (2) there is no agreement on the ‘proper’ pitch positions of scale degrees. We may add that much of the deliberations on pitch are mere theory, and do not relate to practices.


tala structure

Tala structure: A composition in (drut) tintala

The thicker red lines indicate sam, the thin red line(s) are khali(s), the thin greyish lines are tali(s). Since the graphs are made in high resolution it will depend on your monitor how clear the difference between these lines is. It is definitely worthwhile to see the movies full-screen. Note that the tala markers have been added manually – kudos Suvarnalata for that job, although fortunately PRAAT makes it reasonably easy to do. Unfortunately we don’t have a way of doing this automatically yet. Tala is really a virtual structure and the sounds that are recorded do not provide simple clues that would allow the computer to recognise the structure. Also, dividing the temporal (horizontal) dimension into equal spaces doesn’t work, there are minute deviations from the equal size of matra-s that would make the graph misleading. This is a very interesting challenge for computer generated graphs.

How the movies are made

One of the things we had noticed since long is that graphs tend to be much more ‘clean’ if the recording is clean. Most commercial recordings have considerable interference from the accompanying instruments and this results in rather messy graphs. Therefore the studio at NCPA has been using directional microphones for each of the musicians, which are recorded with an M-AUDIO firewire 1814 to Logic Express 8 running on an iMac 27″. All the recordings are of vocalists. Some instruments can also be processed in a similar way, but that remains a future project.

The vocal track is then processed by PRAAT, for which special scripts have been created by Wim van der Meer. Much of the processing was done by Suvarnalata Rao, the final stage being probably the most laborious as the song texts and the rhythmic structure have to be entered manually. Close cooperation between Rustom and Suvarnalata was necessary to get everything in the right place. This included adding the correct Devnagari and Roman English subtitles and fine tuning the precise placement of the tal markers. Finally, you will notice that in many cases the graphs stop when the music goes into the fast portion and a generic image is shown. Though we can process fast music, reading it is a different matter altogether (see our article what you hear isn’t what you see).

The duration of the movies varies from 10 to 15 minutes. When we did the Raga Guide with Joep Bor in the 1990s our point of reference was the marvellous 78rpm recordings of great musicians of the past, like Faiyaz Khan, Kesar Bai Kerkar, Mogubai Kurdikar, D.V.Paluskar and Abdul Karim Khan. And undoubtedly, many of the Raga Guide recordings are jewels. But still, since we were not constrained by time limits anymore, we asked the artists to take the time they felt necessary for giving a clear and sufficient picture of the rag, including some fast movements as well.


Though internet is now quite able to handle the most important scripts of the world, including of course the international phonetic alphabet, we have chosen to keep things simple. Not everyone has an advanced computer yet, not everyone knows how to install and activate different scripts. Therefore we simply use the transliteration of the Allied Chambers Hindi English dictionary completely stripped of all diacritics. Rigorously and without exception. But if you need to know details of spelling you’ll find Hindi and transliteration in the glossary. Oh sorry, one exception is confirming the rule … the lyrics in the movies show final mute “a”. So in the website text you will read rag, tal, but in the movies it will run raga, tala. That’s because in spoken Hindi the mute final “a” is not pronounced but in singing it is.


We’ve also kept the notation internet proof and very basic. The full tone material is:

S r R g G m M P d D n N S

Komal is lowercase, shuddh is uppercase. In the case of ma lowercase is shuddh, uppercase is tivr. There are no lower or upper octave markers. Grace notes are not indicated except for / (slide upward), \ (slide downward), ~ (slow oscillation, andolan). Durations have been indicated at three levels: short (no space), normal (space), prolonged (with hyphen – ): S R G (normal); SRG (fast); S – R – G – (each note long).

It must be stressed that you should listen to and look at the graphs, our short notes are only meant to point out some passages that we consider salient.