By Sohaib Baig
In this essay, I reflect on the possibilities of analyzing metadata of Islamic manuscript collections to better understand Islamic legal history.
This is inspired in general terms by steadily growing conversations between two fields of inquiry: the history of the book (and manuscript studies) and Islamic legal studies.[1] Some of the questions that historians of the book pursue, such as the material histories of texts, the transregional circulation of texts, the politics of reading publics, and, more generally, the histories of information and communication, can provide useful insights for scholars of legal history. These can extend to topics such as the materiality of legal culture, the transmission of legal knowledge over time and place, and the social spaces where legal authority is constructed and contested.
With the growth of new digital tools, scholars and researchers have started working with metadata from library catalogs and bibliographies to pursue quantitative questions pertaining to bibliographic data science, statistical codicology and paleography, and textual network analysis.[2] Such works are often (but not only) concerned with analyzing trends in the production and circulation of texts, correlating specific codicological or paleographical attributes of manuscripts to specific historical contexts and geographies, and mapping relationships between texts (or historical textual transmission).
In the Islamic studies context, there has been some consideration of similar questions in recent years. Working from a subset of data in the Fihrist Union Catalogue, Huw Jones and Yasmin Faghihi used computational methods to determine how certain manuscript features (such as script) may correspond to the listed manuscript provenance.[3] In a different vein, Christopher Bahl compiled data from multiple manuscript libraries from South Asia, Cairo, Istanbul, and the UK to analyze Arabic manuscript culture in the early modern Indian Ocean.[4] In a previous essay for the Islamic Law Blog, I surveyed Ḥanafī texts by South Asian scholars within a union catalog of Turkish manuscript collections which contained more than 300,000 records of manuscripts.[5] Another body of work has combined the metadata of manuscripts with images to pursue deeper paleographical and codicological queries and to train algorithms to execute a variety of functions.[6] For instance, L.W. Cornelis van Lit analyzed the dimensions of the closing manuscript flap (a feature of Islamic manuscripts) to determine which degree range of angles were the most common within a collection of 2,000 digitized Islamic manuscripts.[7]
As scholars have warned, amassing and analyzing metadata from sources such as library catalogs or bibliographies to understand historical manuscript cultures and legal history is no straightforward matter.[8] Catalogs do not provide a direct, unfiltered window into historical manuscript culture for many reasons, including the fact that they represent the collective efforts of all the catalogers who composed them and the collectors who brought them together.[9] Historical catalogs, booklists, and inventories are also far from uniform in format.[10] Contemporary catalogs of manuscript libraries do not adhere globally to one universal standard for metadata, which makes it difficult to integrate multiple catalogs within a single database.[11] Additionally, the different approaches towards cataloging manuscript compilations or miscellanies (majmūʿas), as based on the text or the codex, can present further complications.
Most of these previous works thus far have analyzed regional union catalogs or individual datasets put together by the researchers themselves. Still, scholarship has remained cautiously optimistic about research possibilities moving forward.[12] There is hope to uncover historical features which only reveal themselves at scale. The question has remained: what could we achieve by analyzing a comprehensive database of Islamic manuscripts from around the world?
Such a question was for long only a dream, but recently, a massive project undertaken by the Muʾassasat ʿIlm li-Iḥyāʾ al-Turāth wa-l-Khidmāt al-Raqmiyya (The 'Ilm Foundation for the Revival of Heritage and Digital Services) in Cairo, Egypt has made incredible strides towards this dream.[13] Their research team, which consisted of 400 individuals over 20 years,[14] has constructed a massive database which thus far contains more than 2 million records of Islamic manuscripts, including 1.5 million for Arabic manuscripts.[15] These records are stated to represent manuscript collections from 3000 libraries across 65 countries.[16] Of these 65 countries, the Muʾassasat ʿIlm team traveled to 30 themselves and visited a total of 450 libraries.[17]
Muʾassasat ʿIlm has taken steps to stabilize and standardize categories across these libraries, such as author names, subjects, scribes, and commentaries, based on their own internal conventions. They also have at times enhanced the metadata from existing catalogs based on their direct examination of the manuscripts through on-site visits or digital surrogates. The records do indicate the source of the metadata and whether it contains enhancements from their team.
While the Muʾassasat ʿIlm platform is currently only available to researchers on-site in Cairo, it has since published two special, derivative databases concerning the manuscript corpus of the hadith compilation al-Jāmi' al-Ṣaḥīḥ of al-Bukhārī (d.256/870) (7,541 manuscript records) and the works of the Damascene scholar Ibn Taymiyya (d.728/1328) (2,678 manuscript records). These are open-access and provide a glimpse of their metadata standards. The level of metadata for any given record varies based on the source catalog and whether they had access to the physical manuscript or a digital surrogate. Typically, an entry for a manuscript will include information, such as number of pages, number of lines, the scripts present, endowment and ownership statements and stamps, and other assorted details on the author (including their legal affiliation), scribe, library, and more.
For example, the following manuscript record includes data regarding the owning library, author, title, scribe, the dates (days/months/years/centuries) associated with the author's birth and death, the date when the author originally finished composing the text, the date when the manuscript was copied, the various audition notes present on the manuscript, information on the endowments, and so forth. It does have some limits on codicological data, depending on the sources used. It typically does not include information on things such as quires, chain lines and laid lines, and related details.
In lieu of an in-depth analysis, the rest of this essay shares basic highlights from the database.[18]
Here is the top country breakdown thus far that Muʾassasat ʿIlm has published (out of 65 countries)[19]:
| Country | Number of Manuscript Records | Percentage of database (2,019,488 total records) |
| 1. | Iran | 477,303 | 23.63% |
| 2. | Turkey | 340,000 | 16.83% |
| 3. | Egypt | 197,841 | 9.79% |
| 4. | Saudi Arabia | 120,000 | 5.94% |
| 5. | Pakistan | 93,973 | 4.65% |
| 6. | Iraq | 88,517 | 4.38% |
| 7. | India | 75,000 | 3.71% |
| 8. | Morocco | 71,497 | 3.54% |
| 9. | Mali | 62,594 | 3.09% |
| 10. | Syria | 32,446 | 1.6% |
| 11. | United Kingdom | 35,000 | 1.73% |
| 12. | Tunisia | 31,591 | 1.56% |
| 13. | Yemen | 31,073 | 1.53% |
| 14. | United States | 30,219 | 1.49% |
| 15. | Uzbekistan | 29,000 | 1.43% |
It is difficult to draw conclusions from this table about manuscript cultures, given the historical movement of materials and collections between and beyond the areas now constituted by these countries. Muʾassasat ʿIlm is also still processing data from some of these countries, including India and Iraq. However, it does provide a useful perspective regarding the geographical distribution of manuscript collections across these countries. For many reasons, the sizeable collections, for instance, in countries like Pakistan, India, and Mali, may not receive as much attention by Islamic legal studies scholarship as collections in other countries, such as the United Kingdom, Egypt, and Turkey.
The database provides the option of filtering by subject. When I filtered for Islamic law ("fiqh"), the database displayed 298,019 records (14.75% of the total records). Furthermore, the database allows further dividing in terms of Islamic legal schools (madhhabs).
Here is the breakdown when filtered by madhhab:
| Legal School | Number of Manuscript Records |
| 1. | Ḥanafī | 69,080 |
| 2. | Mālikī | 32,529 |
| 3. | Imāmī/Ja'farī | 21,696 |
| 4. | Shāfi'ī | 20,765 |
| 5. | Ḥanbalī | 2,089 |
| 6. | Zaydī | 1,986 |
| 7. | Ibāḍī | 1,507 |
Besides madhhab, the database provides access to other categories that combine legal writing across madhhabs. See below:
| Subject | Number of Records |
| 1. | Commentaries and Glosses (Shurūḥ al-Mutūn wa Ḥawāshiha) | 41,947 |
| 2. | Legal Theory (Uṣūl al-Madhāhib al-Fiqhiyya) | 41,384 |
| 3. | Fatwās | 19,074 |
| 4. | Inheritance (Farāʾiḍ) | 11,987 |
How may historians and legal scholars explain the data in these tables? What particular legal, social, material factors might help make sense of these numbers? Results such as these can lead down many paths of inquiry. To take the very first category, one might say that the enormous size of the Ḥanafī corpus – more than the next two combined – is likely reflective of the historically vast extent of the madhhab across West Asia and Southeastern Europe. This much may be straightforward; yet, when we check to see the distribution of the collections today, we see a slightly different picture.
| Top Nine | Country (of Library where the Manuscript on Ḥanafī Law is Held) | Number of Records |
| 1. | Turkey | 23,900 |
| 2. | Egypt | 11,601 |
| 3. | Saudi Arabia | 8,684 |
| 4. | Bosnia | 4,433 |
| 5. | Syria | 3,310 |
| 6. | Iraq | 2,291 |
| 7. | Pakistan | 1,805 |
| 8. | United States | 1,716 |
| 9. | Tunisia | 1,451 |
It appears that the historical domains inhabited by the Ottoman/Mamluk empires are primarily responsible for the bulk of the Ḥanafī manuscripts in the database, rather than the rest of West Asia. Such results again invite many more questions. How does the number of unique authors change across these regions? How do these results change from century to century? How much do titles overlap across these regions?[20] Furthermore, the relatively lower numbers for Pakistan and Central Asia (they are still working on processing records from Indian libraries, in addition to the 75,000 Indian records already included) – historically strong Ḥanafī regions – beg further scrutiny. To what extent is this a consequence of the theft and destruction of Islamic manuscript collections in South and Central Asia?[21] To what extent may it reveal something different about Ḥanafī legal history, textual production, pedagogy in these regions? Or does it reflect possible gaps in the metadata?
Such questions can be further entertained at the individual title level. For instance, when I searched the database for the famous 14th century Ḥanafī fatwā compilation, the Fatāwā Tātārkhāniyya by 'Ālim b. 'Alā' al-Indarpatī al-Dihlawī (d.786/1384), it returned back 296 records in total, including 164 in Turkey, 26 in Syria, 17 in Saudi Arabia, 16 in Bosnia, 14 in Egypt, and 12 in Pakistan. This again showed a similar geographical slant for this Indian fatwā collection.
Such data will no doubt transform as Muʾassasat ʿIlm continues its work. Yet, it has already revealed its ability to stimulate and entertain large questions and research across multiple fields. It is safe to say that the Muʾassasat ʿIlm database will have a massive impact on the field, once available more widely. Not only will it help researchers locate individual manuscripts, but it will also help facilitate larger data-based questions about the manuscript cultures of Islamic legal history. We may very well be at the verge of entering a new era.
Notes:
[1] For some recent examples pertaining to manuscript cultures, see Nir Shafir, The Order and Disorder of Communication: Pamphlets and Polemics in the Seventeenth-Century Ottoman Empire (Redwood City: Stanford University Press, 2024); Ahmed El Shamsy, "The Section on Islamic Law (fiqh)," in The Library of Aḥmad Pasha al-Jazzār: Book Culture in Late Ottoman Palestine, eds. Said Aljoumani, Guy Burak and Konrad Hirschler (Leiden: Brill, 2024), 299–314; several chapters in the edited volume, Berat Açıl, ed., Osmanlı Kitap Kültürü Cârullah Efendi Kütüphanesi ve Derkenar Notları, 2nd ed. (Istanbul: İlem Yayınları 2020); Olly Akkerman, A Neo-Fatimid Treasury of Books: Arabic Manuscripts among the Alawi Bohras of South Asia (Edinburgh: Edinburgh University Press, 2022).
[2] For some relevant works, see Eltjo Buringh, Medieval Manuscript Production in the Latin West: Explorations with a Global Database (Brill, 2011); Marilena Maniaci, ed., Trends in Statistical Codicology (De Gruyter, 2022); Leo Lahti, Jani Marjanen, Hege Roivainen and Mikko Tolonen, "Bibliographic Data Science and the History of the Book (c. 1500–1800)," Cataloging & Classification Quarterly 57, no. 1 (2019): 5–23; Ezio Ornato, "The Application of Quantitative Methods to the History of the Book," in The Oxford Handbook of Latin Palaeography, eds. Frank T. Coulson and Robert G. Babcock (New York: Oxford University Press, 2020), 651–78. Also see the special issue edited by Evina Stein and Gustavo Fernández, "Introduction: Fitting Manuscript Studies into the Historical Network Research," Journal of Historical Network Research 9 (2023): iii–xvi; and the series by The Institute for Documentology and Editorship (est. 2006), available at: https://www.i-d-e.de/publikationen/.
[3] Huw Jones and Yasmin Faghihi, "Manuscript Catalogues as Data for Research: From Provenance to Data Decolonisation," Digital Humanities Quarterly 18, no. 3 (2024).
[4] Christopher D. Bahl, Mobile Manuscripts: Arabic Learning across the Early Modern Western Indian Ocean (New York: Cambridge University Press, 2025).
[5] Sohaib Baig, "The Textual Landscapes of Ḥanafī Eurasia: South Asian Scholarship in Turkish Manuscript Collections (Part 1 of 2)," Islamic Law Blog, August 10, 2023, https://islamiclaw.blog/2023/08/10/the-textual-landscapes-of-%e1%b8%a5anafi-eurasia-south-asian-scholarship-in-turkish-manuscript-collections-part-1-of-2.
[6] See, for instance, Kalthoum Adam, Asim Baig, Somaya Al-Maadeed, et al., "KERTAS: dataset for automatic dating of ancient Arabic manuscripts," IJDAR 21(2018): 283–90; Reza Farrahi Moghaddam, Mohamed Cheriet, et al., "IBN SINA: a database for research on processing and understanding of Arabic manuscripts images," in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS '10) (New York: Association for Computing Machinery, 2010), 11–18.
[7] L.W. Cornelis van Lit, Among Digitized Manuscripts: Philology, Codicology, Paleography in a Digital World (Leiden: Brill, 2020), 227–86.
[8] See especially, Jones and Faghihi, "Manuscript Catalogues as Data for Research."
[9] Ibid.
[10] See an approach towards this problem regarding 17th century European book history: Marie-Louise Coolahan, "My Lady's Books: Devising a Tool Kit for Quantitative Research; or, What Is a Book and How Do We Count It?," Huntington Library Quarterly 84, no. 1 (2021): 125–37.
[11] Marilena Maniaci, "Statistical Codicology: Principles, Directions, Perspectives," in Trends in Statistical Codicology, 6.
[12] For instance, Jones and Faghihi, "Manuscript Catalogues as Data for Research," and Cornelis van Lit, Among Digitized Manuscripts, 284.
[13] See "Islamic Civilization Manuscripts Database," Ilm Arabia, n.d., https://ilmarabia.com/home (last visited January 8, 2025).
[14] See "Who we are," Ilm Arabia, n.d., https://ilmarabia.com/about (last visited January 8, 2025).
[15] Personal visit to the database, Cairo, January 19, 2025.
[16] "Malaf Ta'rifi," Muʾassasat ʿIlm li-Iḥyāʾ al-Turāth wa-l-Khidmāt al-Raqmiyya (al-Qahira – Lundun) (1446/2024), 27.
[17] Ibid.
[18] I am thankful to Shaykh 'Abd al-'Aati al-Sharqawi, Chairman and Managing Director, for generously providing me with an overview and sharing access on-site in Cairo. I am also thankful to Garret Davidson and Ahmad Khan for sharing insights about the project. Any errors or mistakes are my own responsibility.
[19] "Malaf Ta'rifi," 23.
[20] While the Muʾassasat ʿIlm platform can supply answers, I did not have the opportunity to pursue them.
[21] See Nur Sobers-Khan, "Muslim Scribal Culture in India Around 1800: Towards a Disentangling of the Mughal Library and Delhi Collection," in Scribal Practice and the Global Cultures of Colophons, 1400-1800 New Transculturalisms, eds. C.D. Bahl and S. Hanß (Switzerland: Palgrave Macmillan, 2022), 217–18.
No comments:
Post a Comment