::Roundtable:: Knowledge, LLMs, and Open Source Communities in Islamic Legal Scholarship

By islamiclawblog on March 13, 2025

If harnessed effectively, the introduction of LLMs has the potential to transform research in Islamic legal studies because of its unique ability to allow computers to go beyond the literal phrasing of texts to recognize the deeper semantic patterns across vast corpora. This transformation would not be unprecedented. At its core, scholarship involves gathering data, organizing it in some way, synthesizing it in order to add new knowledge and then communicating it to interested audiences. Throughout history, technological and pedagogical innovations have reshaped these fundamental components of scholarship, and the Islamic scholarly tradition has been no exception.

Let's consider prior major changes in Islamic scholarship. From the seventh/first to the ninth/third century, ḥadīth transmission went from a largely oral and informal process to one that combined into more precise and formal methods of dissemination.^[1] By the mid eighth century, ḥadīth scholars insisted on a much more precise naming of one's sources and verbatim reproduction of the transmitted texts.^[2] The introduction of paper-making techniques in the ninth century and the resulting drop in the cost of book production had a massive impact on the intellectual culture of the age, inaugurating for the first time in history the emergence of "literate societies."^[3] The simple ability to review written material made legal scholars much more sensitive to making sure their legal views formed a coherent body of law.^[4] The nineteenth century saw similar transformations wrought by the introduction of the printing press.^[5]

In our lifetimes we have witnessed the digitization of Islamic sources, making them machine searchable, and widely available to a large public. Anyone with a computer can access Islamic materials. Laborious trips to vaunted, richly endowed, and often dimly lit libraries are no longer needed. The adept madrasa student in Bangladesh has equal access to many of the most important sources as the richly resourced Princeton professor.

And now we are witnessing yet another transformation with the introduction and dissemination of large language models, often colloquially referred to as artificial intelligence. The dawning of this new age portends changes in scholarship that we are only now beginning to discern. In what follows, we'll explore how these machines understand and represent our textual inheritance, how scholars might harness these new powers, and what kinds of communities we'll need to forge as we enter uncharted territory.

Representation of Meaning: From Databases to LLMs

One way to understand a new technology is to compare it to its predecessor. Traditional databases and Large Language Models (LLMs) represent fundamentally different approaches to handling textual knowledge. A database excels at precisely storing and retrieving information: given a query, it can locate exact matches with perfect fidelity. This mirrors the expertise of the ḥadīth scholar, whose primary concern is the accurate preservation and transmission of texts, along with their chains of transmission. Just as a database can instantly retrieve a specific ḥadīth, with its exact wording intact, the ḥadīth scholar's training prioritizes maintaining fidelity in transmitting information. For Islamic law researchers, the Maktaba Shamela database exemplifies both the power and limitations of this approach.^[6] While researchers can instantly locate every instance of a word or phrase across thousands of texts, the database can only find exact matches—missing conceptually related discussions that use different terminology or even slightly varied phrasing.

In contrast, LLMs, perhaps somewhat like a trained jurist, excel not at precise retrieval but at inductively understanding patterns and relationships within texts. A jurist, through years of studying legal texts, develops an intuitive grasp of precedents, principles, canonical texts and their applications. They learn to recognize patterns in legal reasoning, to draw analogies between cases, and to synthesize diverse textual evidence into coherent legal arguments. Similarly, LLMs develop an "understanding" of textual patterns within a corpus that allows them to recognize relationships and generate responses that reflect the underlying semantic structure of the texts they were trained on, even if they cannot cite specific sources with the precision of a database, nor reproduce the exact texts that formed the basis of the patterns they have learned. Presumably, an LLM trained on the Maktaba Shamela corpus, for example, may be able to identify relevant passages even when they use different terminology, because it has learned to recognize conceptual relationships rather than just matching exact words.

To understand how LLMs represent meaning, we need to start with the concept of vectors, which at heart is simply a list of numbers. Imagine a simple two-dimensional graph, like those we used in high school mathematics. Any point on this graph can be represented by two numbers: its position along the horizontal axis (x) and the vertical axis (y). When we draw an arrow from the origin (0,0) to this point, we create a vector—its direction and length completely defined by that pair of numbers.

Figure 1: Examples of vectors

In the diagram above, we can see three vectors (v1, v2, and v3) plotted in two-dimensional space. Notice how v1 and v2 are relatively close to each other—not only do they point in similar directions, but the distance between their endpoints is small. In contrast, v3 points in a very different direction and is far from both v1 and v2. This ability to measure distances between vectors is crucial: in a vector space, we can mathematically quantify how similar or different vectors are based on their proximity to each other.

To make this concrete as it applies to LLMs and Islamic scholarship, let's look at how we might represent different types of words or concepts in a simple two-dimensional vector space. Let's work with three example concepts: that of the "ḥadīth scholar," "jurist," and "lay person." We have intuitive reason to suppose the ḥadīth scholar and jurist are more like each other than they both are to the layperson: they both have training and expertise on sets of texts, the former more comprehensively with respect to ḥadīth, the latter more extensive across many different disciplines. One way to represent this is to posit that the vertical axis represents expert training, while the horizontal axis represents personhood. In the second diagram, we've plotted vectors for a ḥadīth scholar, a jurist, and a layperson. Notice how the ḥadīth scholar and jurist vectors are relatively close to each other—reflecting their shared commonality of both being humans and having gone through training as a scholar. The layperson vector points in a different direction, and a vector representing an inanimate object, such as a candle, would point in a completely different direction altogether.

Figure 2: Example of Concept Representation in 2-D vector space

Now imagine that we increase the dimensions making up a vector from two to hundreds or even thousands. Beyond just "expertise" or "personhood, the other dimensions might track properties like "gender, "level of abstraction," "time," or "part of speech," with each number representing the extent to which that property is present in the concept. One can quickly see how such a system can intricately represent a corpus's semantic structure.

But this is different from LLMs. While this example helps us understand how meaning can be represented mathematically, in all honesty, this is not how LLMs actually work. The creators of LLMs do not determine what properties these dimensions will represent, nor do they set the numbers manually. Rather, LLMs learn these numbers through training on a corpus.

This training begins by taking a large corpus of text and dividing it into manageable chunks, each containing 512 to 2048 tokens (roughly parts of words). For each chunk, every word is initially assigned a random vector—imagine each word getting a list of random numbers across hundreds of dimensions. The model then plays a prediction game: in each chunk, it masks out a word (or several words) and tries to predict what should go there based on the vectors of the surrounding words. For example, in the sentence "The scholar wrote a ___," a trained model would try to predict "book" or "treatise" using the vectors of "The," "scholar," and "wrote." But before being trained, it would guess something completely random like "Palestine."

When the model predicts incorrectly—say, guessing "butterfly" instead of "book"—it measures the mismatch between its guess and the correct word in the chunk it is processing at the moment. Since both words are represented as vectors (those lists of hundreds of numbers), the model can precisely measure how far off it was, just like calculating the distance between two points on a map. Then, like a student learning from mistakes, it makes tiny adjustments to all the related vectors to make the correct prediction more likely next time. This process of continuous adjustment, called gradient descent, repeats across millions of text chunks until the model's predictions become reliably accurate. At this point, we can say that the vectors have captured meaningful patterns in the language—their dimensions now encode complex semantic relationships present in the training corpus.

What makes these dimensions particularly fascinating is that unlike in our simple diagram where we deliberately assigned meaning to each axis, the actual dimensions that emerge during LLM training aren't pre-defined. What each dimension represents—if it represents anything interpretable at all—is an active area of research, like how neuroscientists study where and how concepts are encoded in the human brain. Just as researchers are still working to understand how fundamental concepts like time or space are represented across neural networks in our brains, AI researchers are developing tools and methods to understand how language models encode meaning across their numerous dimensions. The goal isn't just to make these models more effective, but to potentially uncover patterns in how knowledge is structured that might not be apparent through traditional scholarly methods.

Traditional computational techniques such as topic modeling, text-reuse, and term-frequency analysis can reveal important patterns in Islamic legal texts. Christian Lange and his colleagues demonstrated this in their pioneering work, which systematically analyzed over 11,000 texts from the Maktaba Shamela corpus to map citation networks and track the evolution of legal terminology across different schools and time periods.^[7] However, the use of LLMs offers additional analytical possibilities. Where frequency analysis might tell us how often certain terms or concepts appear, LLMs can potentially detect more subtle patterns in how these concepts are used and related to each other. For example, while counting instances of "ḥarām" may indicate a school's general tendency toward prohibition as Lange et al. find,^[8] an LLM trained on that school's texts might reveal distinctive patterns in how prohibitions are justified, how they relate to other legal concepts, and how they fit into broader frameworks of legal reasoning. This deeper layer of analysis is possible because LLMs don't just count words—they encode the semantic relationships between concepts based on their usage contexts.

Applications of LLM in Islamic Legal Research

The emergence of large language models has coincided with a transformation in the machine learning ecosystem. While companies like OpenAI and Anthropic maintain closed-source models, the research community has developed numerous open-source alternatives, many trained on Arabic and Islamic texts. Platforms like Hugging Face have become central hubs for sharing and collaborating on these open-source models, making powerful language models accessible to researchers worldwide.

Below, considering the fundamental way in which LLMs can represent the semantic structure of a corpus, and the recent emergence of a robust ecosystem of AI tools, I sketch what LLMs may enable the field of Islamic legal studies to accomplish. But before they can do so, they must be modified. These models can be further specialized through a process called fine-tuning, where an off-the-shelf model's existing knowledge is refined for specific tasks using carefully selected and curated training data.

To illustrate the transformative potential of fine-tuned models compared to traditional computational methods, consider the task of identifying terms denoting people, places, or institutions—a process that a well-trained LLM can complete in minutes, whereas conventional methods might require months of meticulous work. Traditional computational approaches typically demand creating an exhaustive catalog of expected terms for people, places, and institutions, then systematically searching and annotating the corpus. The fundamental challenge lies in constructing this initial comprehensive set, which necessitates thoroughly scanning the entire corpus and developing intricate rules to capture the nuanced variations in how these terms might appear across different texts.

This same task, in machine learning parlance, is called Named Entity Recognition (NER), and it proceeds in a fundamentally different way.^[9] Instead of predefined lists, scholars take a representative subset of the corpus and manually annotate terms denoting people, places, and institutions. For example, a human annotator would mark "Abū Ḥāmid al-Ghazālī" with the label "Person," "Ṭūs" with the label "Place," and "al-madrasa al-Niẓāmiyya" with the label "Institution." The human annotator would do this on several different representative and diverse samples from the corpus.

These human-annotated samples would then be fed to an LLM. The LLM would attempt to learn the patterns behind the labels. Instead of predicting the masked word in a sample or the next logical word, given the sample, the LLM trains on this annotated corpus by trying to predict the correct label for every word it encounters in each text chunk. Fine-tuning for NER specifically teaches the model to identify whether each word represents a particular type of entity. When the model makes an incorrect classification—for instance, failing to recognize "ṣāḥib al-Iḥyāʾ" (the author of the book Iḥyāʾ ʿulūm al-dīn) as referring to a "Person"—it gradually adjusts its internal vectors until it makes consistent, accurate predictions. Through this process, the model learns the patterns and contexts found in its training examples in a way that it can apply them to examples it has never seen.

This capability is particularly valuable in Islamic legal texts, where scholars may be referred to by multiple names (e.g., Abū Ḥāmid, al-Ghazālī, and al-Ghazzālī), titles (e.g., ṣāḥib al-Iḥyāʾ, "author of the Revival"), or honorifics (e.g., Ḥujjat al-Islām, "Proof of Islam"). A single scholar might appear under dozens of different references throughout a corpus, which traditional rule-based systems would struggle to capture without extensive manual effort. Creating an LLM fine-tuned for NER on an Islamic legal studies corpus would require significant experimentation to determine the optimal base model, develop a representative annotated training set, establish evaluation metrics, and determine the optimal configuration of variables that control how the model learns from the data. This development process is inherently iterative and collaborative, requiring both domain expertise from Islamic legal scholars and technical knowledge from Machine Learning researchers. It's important to note, however, that even the best fine-tuned models are unlikely to achieve 100% or even near-100% accuracy.^[10] Rather than pursuing perfect automation, the most effective approach would likely involve dividing tasks between humans and the LLM in an optimally efficient workflow—perhaps using the model for initial annotation and human experts for verification, or having the model handle common cases while specialists address edge cases and ambiguities.

The scale of this opportunity becomes clear when we consider the size of the corpus: the Maktaba Shamela collection alone contains approximately 1,400 legal texts from the Sunnī tradition, representing just one branch of Islamic legal thought. A fine-tuned language model could automatically annotate this vast corpus, identifying and classifying entities such as scholars' names, places, institutions, historical events, dates, tribal affiliations, and professional roles. This automated analysis could accomplish in hours what would require years of manual labor. It could create a rich semantic layer over the texts, transforming them from raw text into a structured network of interconnected entities.

Such an entity-aware corpus would enable numerous research possibilities. Scholars can trace the evolution of legal concepts through networks of citation and influence, mapping how ideas spread across geographical regions and through generations of scholars. The automated identification of dates and places allows for the creation of detailed chronologies and geographical distributions of legal discourse. When professional roles are tagged, researchers can analyze how different types of scholars—judges (quḍāt), jurists (fuqahāʾ), traditionists (muḥaddithūn)—engaged with different legal questions.

The applications extend beyond traditional historical research. By identifying institutions, we can map the development of formal legal structures and educational institutions (madrasas). Tribal affiliations, when tagged systematically, could reveal patterns in the social distribution of legal authority. The identification of specific types of legal rulings (fatāwā) and their contexts can reveal how legal reasoning adapted to different social and historical circumstances and how jurists responded to changing societal needs.

Perhaps most importantly, this rich annotation layer makes the corpus accessible to new forms of quantitative analysis. Researchers can ask questions that would be impossible to answer through manual reading: How did the frequency of citations to earlier authorities change over time? What was the geographical distribution of different legal methodologies? How did networks of scholarly influence shift across centuries? These questions become answerable not through replacing traditional scholarly methods, but by augmenting them with new analytical capabilities. The ability of an LLM to automatically recognize named entities provides a foundation for more sophisticated analysis of legal texts.

Towards an Open-Source Islamic Law Community

The integration of large language models into Islamic legal studies represents more than just another technological innovation in a long history of such changes. Like the shift from oral to written transmission, the advent of paper, and the introduction of printing and digital databases, LLMs have the potential to transform not just how we access and analyze texts, but how we organize the entire scholarly enterprise. The technology has dramatically reduced the resources needed to create richly annotated legal corpora that can facilitate both research and practical application of Islamic law. What might have once required years of coordinated effort by teams of scholars to simply systematically identify people, places, and institutions across a corpus of thousands of texts can now potentially be accomplished through careful application of these new tools. Given a substantial enough training dataset and a limited task scope like named entity recognition, successfully fine-tuning an LLM may take just a few months of a small team's labor.^[11] Then, a single LLM fine-tuned on Islamic texts with an acceptable error rate can process this corpus and provide a first draft annotation marking all people, places, and things in a matter of hours.

However, this very efficiency brings new challenges. While LLMs may reduce the raw human hours needed for certain tasks, they demand new forms of collaboration that cut across traditional academic and professional boundaries. Creating effective systems requires intimate knowledge of the legal traditions that make up the corpus, sophisticated understanding of LLM deployment and training, and careful evaluation of model performance against scholarly standards. Domain experts—historians and scholars of Islamic studies and Islamic law—are essential for generating quality training data by manually annotating sample texts to identify people, places, and institutions. These same scholars are uniquely positioned to evaluate whether fine-tuned models accurately identify entities in previously unseen texts. Complementing this expertise, AI engineers bring critical knowledge of model architectures, fine-tuning methodologies, and computational frameworks necessary to build the technical infrastructure. Such collaborations may yield various outputs: co-authored research papers in technical journals, publicly released fine-tuned models, or annotated corpora that serve as resources for the broader scholarly community. This kind of interdisciplinary collaboration runs counter to humanities' traditional model of individual scholarship, where career advancement often depends on sole-authored publications rather than collaborative infrastructure building.

The stakes of getting this right extend far beyond academia. Islamic legal traditions represent one of humanity's most sophisticated and extensive engagements with questions of ethics, justice, and social organization. Yet today, in my view, these traditions are often either marginalized in global discussions or viewed primarily through the lens of national security concerns. Open-source development of tools for studying and understanding these texts could help reframe this dynamic, making these intellectual resources more accessible both to the communities who have historically drawn upon them and to broader global audiences interested in engaging with them seriously. Examples of such tools include fine-tuned LLMs that scholars can further develop or deploy on their chosen corpora, annotated datasets that facilitate the training of specialized LLMs, and text-processing tools tailored for Islamic legal studies (e.g., tools for parsing classical Arabic legal terminology). Additionally, the outputs of these tools—such as curated, annotated corpora enriched through LLM-assisted processing—could become valuable shared resources for scholars. The goal is to foster a research community that openly shares data and expertise, developing a set of reliable and accurate methods, datasets, and tools that compensate for the shortcomings of commercial LLMs while leveraging their strengths in contextualized pattern recognition. The path forward likely lies in building interdisciplinary communities that connect AI researchers, Islamic studies scholars. This would require rethinking not just how we use technology, but how we organize and reward scholarly work.

This challenge brings us full circle to the transformative effects of technology on scholarship. Just as the introduction of paper changed not just how texts were recorded but how knowledge was organized and transmitted, LLMs may change not just how we analyze texts but how we structure the entire enterprise of scholarship. The question before us is not just how to use these tools effectively, but how to build communities and institutions that can realize their potential while remaining true to the profound intellectual traditions they seek to study and preserve.

Suggestions for Further Study

For those looking to gain a foundational understanding of LLMs and their development, I recommend starting with 3blue1brown's course on Neural Networks for its clear and engaging explanation of the mathematics and theory behind these systems. Following that, Andrej Karpathy's course, Neural Networks: Zero to Hero, provides one of the most insightful expositions I've encountered. Karpathy not only walks viewers through the mathematics, algorithms, and architecture of large language models but also demonstrates their historical development, showing how researchers have constructed these systems piece by piece over the past decade. His practical demonstrations in Python further demystify the process of building such models. Additionally, leveraging AI tools for interactive learning—asking questions about challenging concepts and exploring their responses—can be an immensely motivating way to deepen understanding. I've found NotebookLM to be an excellent resource for understanding concepts found in a particular set of articles or videos, because it tries to limit its responses to the sources you give it and gives citations for how it developed its answer. I use OpenAI's ChatGPT and Anthropic's Claude to get a general sense of a concept's structure referenced in the literature or in videos by simply asking the chatbot to expand on it or sharing what I understand and having it respond to my understanding.

Notes:

^[1] Schoeler notes that by the middle of the ninth/third century, with the establishment of Baghdad as the center for ḥadīth transmission, scholars preferred ḥadīth transmitted based on written aids over those from memory alone. For this point specifically, see Gregor Schoeler, The Oral and the Written in Early Islam, trans. Uwe Vagelpohl (New York: Routledge, 2006), 115. For a history of contrasting attitudes towards different types of ḥadīth transmission and its actual practice in early Islam, see ibid., 111–41; Michael Cook, "The Opponents of the Writing of Tradition in Early Islam," Arabica 44, no. 4 (1997): 437–530.

^[2] Jonathan A. C. Brown, Hadith: An Introduction (Oneworld Publications, 2009), 22, 90.

^[3] Bloom writes, "Writing developed in some centers of ancient civilization, but the transformation to literate society was accomplished only with the help of paper, a writing material invented in China about two thousand years ago. Nevertheless, it was not the Chinese who exploited the potential of paper in this way, but the Muslims of West Asia, beginning in the ninth century. Their use of paper for writing inaugurated 'a new era of civilization, the one we live in now,' as the historian Alfred von Kremer wrote more than a century ago." Jonathan M. Bloom, Paper before Print: The History and Impact of Paper in the Islamic World, ACLS Humanities E-Book. (New Haven: Yale University Press, 2001), 17, https://doi.org/2027/heb.06496, 10.37862/aaeportal.00217.

^[4] See Ahmed El Shamsy, The Canonization of Islamic Law: A Social and Intellectual History (Cambridge: Cambridge University Press, 2013), 165–66.

^[5] Ahmed El Shamsy, Rediscovering the Islamic Classics: How Editors and Print Culture Transformed an Intellectual Tradition (Princeton, NJ: Princeton University Press, 2020), 6–7, https://doi.org/10.1515/9780691201245.

^[6] Anecdotally, I can say with confidence that many researchers use digitized corpuses, like Maktaba Shamela, as do I, but few seem to acknowledge it in their published works, and fewer still cite to it directly. For an example of acknowledgement of use, see Ahmed El Shamsy, "Al-Shāfiʿī's Written Corpus: A Source-Critical Study," Journal of the American Oriental Society 132, no. 2 (2012): 206, https://doi.org/10.7817/jameroriesoci.132.2.0199. For a couple of examples of citations, see Nitzan Amitai-Preiss et al., "A Terracotta Pen-and-Inkwell Case from Jerusalem," 'Atiqot 110 (2023): 223; Nadeen Mustafa A. Alsulaimi, "Islamic and Western Approaches to the Qur'ān: A Rhetorical and Thematic Analysis of Sūrah 4 'The Women' (al-Nisā')," (PhD diss., Catholic University of America, 2018), 27, ProQuest (10792431).

^[7] C. Lange et al., "Text Mining Islamic Law," Islamic Law and Society 28, no. 3 (2021): 256–58, https://doi.org/10.1163/15685195-bja10009

^[8] Ibid.

^[9] For one recent example of developing NER in Arabic, see Khaled Shaalan and Mai Oudah, "A Hybrid Approach to Arabic Named Entity Recognition," Journal of Information Science 40, no. 1 (2014): 67–87, https://doi.org/10.1177/0165551513502417 .

^[10] One recent approach, which combines both "rule-based" approaches with machine learning ones on Arabic texts, boasts F1 scores of ~90%. See ibid.

^[11] The biggest time and labor cost is the creation of the training dataset. On the general lack of open-source Arabic corpora for NLP development and research and therefore the time it takes to create them, see ibid., 69. Here are some back-of-the-envelope calculations on the resources required: The Maktaba Shamela corpus contains approximately 1 billion tokens across 8,000 texts, with about 1,400 texts pertaining to Islamic law—approximately 175 million tokens. A sample size of 256 tokens (a little under half a page) yields 683,594 potential samples. Assuming we need a training dataset of one percent of these samples, we would require 6,836 annotated examples. At an average annotation time of 7 minutes per sample, this would require about 798 hours of labor. Two annotators working 20 hours weekly could complete this in approximately 5 months. While creating a high-quality annotated dataset represents the most challenging part of the process, experimenting with base models and fine-tuning configurations would require an additional 2 months.