The majority of languages spoken in modern South Asia belong to four major families. Most people in the north today speak Indo-European languages while those of the south are largely Dravidian speakers. The peoples of the Himalayan regions mainly speak Sino-Tibetan languages and in some tribal regions of India there are a small number of Austro-Asiatic speakers. In a few small isolated areas, people speak languages that have no known living relatives; once there were probably many more such languages.
Written sources provide some evidence of the languages spoken in the past. The earliest Dravidian literature, the Sangam poems, written in Old Tamil, comes from the south and dates from late BCE and the early centuries CE; the earliest Indo-Aryan inscriptions are a few centuries older but there is also a substantial body of earlier Indo-Aryan oral literature, faithfully preserved by exact repetition, and later written down. The earliest of these texts, the Rigveda, was composed during the second millennium BCE, beginning some time between 1700 and 1500 BCE.
Other information on languages spoken in early South Asia can be extracted by studying the ways in which languages have influenced each other and the relationships between South Asia's languages and those of other parts of Eurasia. Additional information can be gleaned from place-names and the modern distribution of the languages.
Indo-Aryan. The latest of these languages to arrive in South Asia was Indo-Aryan. By the late first millennium BCE, there were speakers of languages belonging to the Indo-European language family from Britain to Sri Lanka and Central Asia (although these regions were also inhabited by speakers of nonIndo-European languages). Much ink has been spilled on the question of how this wide distribution came about. The majority of scholars now accept that the Indo-Iranian branch of the family spread into eastern Iran and northern South Asia during the second millennium BCE, from regions farther to the north. This is supported by linguistic and textual evidence, the former including the relationship of the eastern, Indo-Iranian branch to other Indo-European languages, the close relationship of Old Iranian and Old Indo-Aryan (Sanskrit), the earliest Indo-European languages of eastern Iran and northern
South Asia respectively, and their subsequent divergence. Similarly the earliest extant religious texts of these two linguistic groups, the Old Avesta and the Rigveda, show striking cultural similarities, reflecting a close relationship between the groups, while the Young Avesta and later Vedic texts indicate growing cultural divergence. None of these texts can be closely dated, but most unbiased scholars assign dates in the range 1700-1000 BCE to the various parts of the Rigveda and around 1400 BCE to the earliest Avestan texts. Loanwords for a number of things related to agriculture and settled life, such as camels, canals, and bricks, present in both Indo-Aryan and Iranian, reflect a period before the two languages separated, when the speakers of Proto-Indo-Iranian were in close contact with the Bactria-Margiana Archaeological Complex (BMAC) in Bactria and Margiana, dated around 2100-1700 BCE. Although much of the detail has still to be established, it is clear that previously the speakers of Proto-Indo-Iranian had occupied an area to the north of the BMAC. As their descendants gradually migrated into Iran and northern South Asia during the second millennium, their languages diverged to form Iranian and Indo-Aryan. A small third branch, comprising the Nuristani (Kafir) languages, became established in the Hindu Kush but did not spread more widely.
The geographical information in the Indian sacred texts, particularly the Vedas, attests to the presence of Indo-Aryan speakers first in parts of Afghanistan, Seistan, Swat, and the Punjab and to their gradual spread south and east within the subcontinent. Gradually Indo-Aryan languages came to be spoken over most of the subcontinent except for the south, as well as in Sri Lanka. These included both Sanskrit, the educated language of the sacred texts, which remained relatively pure, and a variety of Prakrits, Indo-Aryan languages heavily influenced by the phonology of the Dravidian languages.
While it is possible that a few speakers of Proto-Indo-Iranian traveled to the Indus region during the third millennium, linguistic history firmly rules out any suggestion that an Indo-Aryan language was spoken by the Harappans. Furthermore, the Indo-Aryan texts reflect a society very different from that of the urban Indus civilization—small warlike groups of pastoral nomads to whom the domestic horse was supremely important; there were no horses in the subcontinent in the Harappan period.
Dravidian. The other main language family of the subcontinent is Dravidian. The majority of Dravidian speakers now live in the four southern states, forming the South Dravidian branch. Small groups in Maharashtra and Orissa, speaking Kolami, Naiki, Parji, and Gadaba, form the tiny Central Dravidian branch. Kurukh and Malto are spoken by groups in parts of eastern central India; according to their own traditions, they moved there in historical times from southern Gujarat, to which they had earlier migrated from the middle Ganges Basin. These languages belong to the Northern Dravidian branch, which also includes Brahui, spoken far to the north, in the Brahui Hills of southern Baluchistan. It is uncertain whether Brahui speakers are the last in situ remnants of a widespread Dravidian-speaking population that originally extended this far north or migrants into Baluchistan from Gujarat in historical times, though the latter alternative is thought more likely. There is good evidence that at some time Dravidian languages were spoken over most of western India and in much of Sindh as well as in the peninsula; this is shown by the distribution of Dravidian place-names and of certain shared features in the modern languages across these regions.
All these languages descend from Proto-Dravidian, which split first between North Dravidian and Proto-South-Dravidian. The latter separated into Central and South Dravidian probably by 1500 BCE. Finally, by around 1100 BCE, South Dravidian split in two: the southern branch (Kannada, Malayalam, Tamil, and a number of small languages) in Karnataka, Kerala, and Tamilnadu, extending into northern Sri Lanka; and the south-central branch, the majority Telugu speakers in Andhra Pradesh but with small groups speaking mainly Gondi in Maharashtra, Madhya Pradesh, and Orissa.
Austro-Asiatic. Some Indian communities still speak languages belonging to the Austro-Asiatic language family, which also includes the Mon-Khmer languages of Southeast Asia. Most—including the principal ones, Mundari and Santali—belong to the Munda group that is spread over much of Bihar, West Bengal, and Orissa but that is also spoken on the upper Tapti River in central India. The languages of the Nicobar Islands form a separate Austro-Asiatic branch. An extinct language, Para-Munda in the northwest, may represent a separate branch of Austro-Asiatic.
Sino-Tibetan. Another great language family that extends into the subcontinent is Sino-Tibetan, spoken from the Himalayas to China and Southeast Asia; the languages in South Asia belong to its Tibeto-Burman branch. Their present distribution in Tibet, Burma, and adjacent regions seems likely broadly to reflect that of the past.
Other South Asian Languages. In a few pockets, mainly in the mountains and in tribal areas, languages survive that have no known relatives, including Burushaski in the western Karakorum. Others are now extinct, including Kusunda in central Nepal, Vedda and Rodiya in Sri Lanka, and probably the original language of the hunter-gatherer Tharu in the Himalayan foothills. Nahali is spoken along the Tapti River and in the Aravalli and Vindhya Hills. This seems to have been a language isolate, overlain successively by Austro-Asiatic, Dravidian, and Indo-Aryan layers; about a quarter of its vocabulary derives from the original language.
It is likely that in the past many unrelated languages were spoken in different parts of the subcontinent. Work by Masica (1979) identified an unknown language, dubbed Language X, once present in Uttar Pradesh and Bihar. Words from this language that have entered the vocabulary of Dravidian and Indo-Aryan include the names of many indigenous flora and fauna, demonstrating that it was an earlier autochthone.
Neighboring Languages. Languages spoken in neighboring regions in Harappan times included those of the BMAC and Namazga cultures in Bactria, Margiana, and Turkmenia. Elamite was spoken in the west of the
Iranian plateau, and unrelated languages, about which nothing is known, were spoken in Marhashi and Aratta farther east—and there were probably many others. Elamite was probably a language isolate, although David MacAlpin put forward the theory, not widely accepted by linguists, that it shared a common ancestor, Proto-Elamo-Dravidian, with the Dravidian language family. Farther west, the people of southern Mesopotamia spoke Sumerian, another language isolate, and Akkadian, part of the large family of Semitic languages to which most present Near Eastern languages belong. The languages of the Gulf are unknown. Across the whole region, there were probably a large number of languages, including many that have now died out and some that had no living relatives even then.
Identifying the Harappan Language
Language Change. Languages spoken by groups who are in frequent contact influence each other in many ways. Changes generally take place in the context of bilingualism, when speakers of one language use the vocabulary of the other with the structures and sounds of their own. Evidence of such substrate influences can be traced in various ways, notably from the presence of loanwords and from the adoption of features of the phonology, grammar, and syntax of one language by the other. These data can be used to study past distributions of known languages and to detect the existence of languages that have otherwise completely disappeared. The chronological order of the languages in a region can be established by looking at loanwords, since incomers generally adopt the indigenous names for plants, animals, landscape features, objects, practices, and other things that are unfamiliar to them, and they often use or modify existing place-names. Loanwords can also give some idea of cultural and economic differences between the speakers of different languages.
The linguistic history of the Indo-Aryan languages has been intensively studied from the literary sources, and much work has gone into reconstructing the Proto-Dravidian language, but much less is known about the other languages of the subcontinent. Considerable work has been done on identifying the substrate influences on Indo-Aryan, far less on those affecting Dravidian and Munda.
Substrate Influences on Indo-Aryan. The Indo-Aryan language was the most recent to arrive, and the geography and relative chronology of substrate influences on it can be quite closely traced in its surviving oral literature. Around three hundred and eighty non-Indo-European words in the Rigveda reflect the early influence of a considerable number of languages, including Proto-Burushaski, Tibeto-Burman, Munda, Dravidian, and others. One feature that characterizes South Asian languages in general is the use of retroflex consonants; the Indo-Aryan languages are the only Indo-European group to include retroflex consonants in speech, and it is clear that this feature was acquired after Old Indo-Aryan entered the subcontinent.
Many linguistic features show an early substratum influence of Dravidian on Indo-Aryan, pointing to extensive contact between speakers of these languages during the Vedic period and suggesting that early Dravidian is the obvious candidate for the Harappan language. This, however, is challenged by Witzel (1999a), whose studies of the Rigveda lead him to identify three phases in the composition of its ten books. (Dating of the Rigveda is not precise but he suggests approximate dates for these of 1700-1500, 1500-1350, and 1350-1200 BCE.) In the first phase he identifies the main influence as coming from a non-Dravidian language, which was also the source of the majority of the loanwords acquired in the later phases. The word structure (particularly the use of prefixing) suggests this to have been an Austro-Asiatic language, which he calls Para-Munda; this reflects the existence of a hitherto unrecorded western branch of Austro-Asiatic, which he traces in the eastern Punjab, Haryana, and areas farther east. Loanwords also included some that probably came from Language X, spoken farther east in the Ganges Basin; these had probably been borrowed from this language into Para-Munda at an earlier date.
Though Para-Munda remained the major influence in Witzel's second phase, as the geographical horizons of the Indo-Aryan speakers broadened, a slight Dravidian substrate influence becomes apparent, probably acquired in Sindh, and this increased in his third phase. There were also loanwords from Language X, spoken in Uttar Pradesh and Bihar, and from Tibeto-Burman languages spoken along the Himalayan foothills. It is likely that the sacred Vedic texts were kept relatively free of "contamination" with non-Indo-Aryan language; the substrate influences apparent in the Rigveda must therefore reflect very considerable interaction and bilingualism between Indo-Aryan speakers and the indigenous population. Far more substrate influence is visible in the later Vedas and other early literature. Dravidian had a strong influence on later Indo-Aryan vocabulary, morphology, and syntax; most of the loanwords are South Dravidian.
Where Were the Dravidians? The prehistory of the Dravidian languages is not at all clear. Some information can be gleaned from the vocabulary of Proto-Dravidian, reconstructed on the basis of cognate words present in both a North Dravidian and a South and/or Central Dravidian language; these words must reflect something of the situation of Dravidian speakers before the branches separated. In contrast, although some developments that occurred after North Dravidian split off and before South and Central Dravidian separated may be reflected in cognates between nonadjacent members of the two branches, their proximity means that some of the shared words may be due to later contact.
Fuller (2007) has looked closely at the botanical vocabulary reconstructed for Proto-Dravidian and considers it to be characteristic of the Dry Deciduous forest zone of central and peninsular India, stretching from Saurashtra into the south and possibly extending into adjacent savannah regions. This, then, should represent the general area in which Dravidian speakers were living before the branches separated. According to Southworth (2005b), the Proto-Dravidian vocabulary indicates an economy with hunting, animal husbandry, and agriculture, including the use of the plow; though in general it seems to reflect a village existence, a few words suggest something more, including words for an upper story and for a beam, as well as others reflecting metallurgy, some degree of social stratification, trade, and some kind of payment of dues (perhaps taxes or contributions to religious ceremonies). These are not specific enough to pin down the Proto-Dravidian speakers, but archaeological communities that could be accommodated within it include the Neolithic/ Chalcolithic cultures of western Rajasthan, the Deccan, and the peninsula in the third and second millennia, as well as the inhabitants of Harappan Saurashtra, who were part of the state but largely rural (Sorath Harappan). This is compatible with the evidence from the Rigveda on the chronology and geography of Dravidian influences on Indo-Aryan. The ancestors of the Proto-Dravidian speakers may therefore have been among the indigenous groups in these regions when pastoralists and farmers from the Indo-Iranian borderlands entered the Indus region.
An alternative, though perhaps less likely, scenario is that these colonists themselves were speakers of early Dravidian; Southworth does not rule out the possibility that ancestors of the Proto-Dravidian speakers could have come from outside the subcontinent. Not much substrate evidence survives in the Rigveda to indicate what language was spoken in Sindh in the second half of the second millennium, but on the basis of a few words Witzel suggests a language related to Para-Munda but with dialectal differences. This he calls Meluhhan, since that was the language spoken by the Harappans who traded with Mesopotamia, where it required a translator. He notes the presence of a number of Meluhhan words in southern Mesopotamian texts, none of which seem likely to have a Dravidian etymology. However, it is probable that sesame oil was among the Harappan exports to Mesopotamia; this oil was known as ilu in Sumerian and ellu in Akkadian, closely similar to a South Dravidian name for sesame, el or ellu, which may suggest that the oil was introduced under a Dravidian name; Witzel, however, points to the Para-Munda word for wild sesame, *jar-tila, as a possible alternative source.
Parpola (1994) suggested that the name "Meluhha" might come from two Dravidian words signifying "highland country," though the term mel-akam itself is not attested to in any Dravidian language. Zvelebil (1972) provides evidence that may support this, showing that many Dravidian-speaking groups right across the subcontinent call themselves by names that mean something like "people of the hills." Variations on the name "Meluhha" reappear in Vedic and later literature, applied to communities not speaking Indo-Aryan. Particularly close is milakkha, in the Pali dialect spoken in western north India. Though the name might suggest a formative period in Baluchistan, there are suitable alternative mountain areas, such as the Eastern Ghats.
Other Languages. Substrate influences on Dravidian show that it arrived later in the subcontinent than Austro-Asiatic since many terms for native flora and fauna in Dravidian are loanwords, generally thought to come from Munda, though Para-Munda is now an alternative.
A few other words do not have etymologies traceable in any of the languages of the subcontinent. These include the word for wheat; this appears in Dravidian, Para-Munda, and Indo-Aryan in various forms, all of which can be derived from a Near Eastern original. It is therefore likely that this handful of foreign words came in with the Near Eastern domesticates (with or without associated settlers) in the eighth or seventh millennium BCE.
The Harappan Languages. The people of the Indo-Iranian borderlands were a major, and perhaps the predominant, component of the Harappan population. Biological evidence (discussed in Chapter 4) indicates that the population of the borderlands originally formed a biological continuum with other contemporary South Asians. Some biologically distinct individuals settled there sometime between 6000 and 4500 BCE, though cultural continuity was unbroken. Thereafter there was apparently no change in the population until the Post-Harappan period. It is possible that the fifth-millennium immigrants introduced a new language to the subcontinent; given that their biological affinities seem to have been with people living on the Iranian plateau, it is possible that their language was related to the pre-Iranian languages of this region. Some parts of Book 8 of the Rigveda relate to eastern Iran and Seistan, where names belonging to the pre-Iranian-speaking substrate sound like Dravidian. This may suggest that the new arrivals in the fifth millennium were early Dravidian speakers.
There are therefore several possible scenarios for the linguistic situation in the Indus region during the Harappan period. Para-Munda, spoken in the Punjab at the time when the Rigvedic Aryans arrived and seemingly also by the Late Harappan settlers who were moving eastward into the Ganges region, must have been in the subcontinent for a considerable period. If the area where it was spoken in the Pre-Harappan period included the Indo-Iranian borderlands, then it is likely that Para-Munda was the main Harappan language, at least in the Punjab and probably throughout the civilization, and that Dravidian was a language spoken by the indigenous inhabitants of the west, possibly as far northwest as Saurashtra. In this case the language of the Post-Harappans in Gujarat may have developed into the North Dravidian branch.
Alternatively Para-Munda may have been the language spoken by the hunter-gatherer-fisher communities that inhabited the Indus region before the people of the borderlands settled in the plains. If the newcomers to the region in the fifth millennium were Dravidian speakers, then it is possible that a Dravidian language was spoken by at least some of the farmers and pastoral-ists of the borderlands who settled in the plains and therefore by some Harappans but that Para-Munda remained the main language of many Harappan inhabitants of the Punjab.
Studies of the Harappan script indicate that it was used to write a single language. It seems plausible that the overarching cultural unity of the Harappans would be matched by the existence of an official language, used in writing and spoken as a lingua franca throughout the Harappan realms. Nevertheless, it is quite possible that one or several other languages were also spoken in the Harappan state, specific to different regions or occupational groups, reflecting the different communities that had come together in its formation. Prolonged bilingualism is known to have occurred in other areas, for example in Mesopotamia where Sumerian and Akkadian coexisted for many centuries: though they belonged originally to the south and north parts of southern Mesopotamia (Sumer and Akkad), educated people from both regions spoke both languages.