icon

Dr. Vahab Mirrokni

Information and Communication Science and Technology

Year of Birth:

1979

Place of Birth:

Iran

Work:

Locality-Sensitive Hashing Scheme Based on p-Stable Distributions

Biography‌

A Journey of Curious Mind

The journey and quest of every researcher’s mind weave a tapestry of questions, experiences, and experiments that shape the path of science. Vahab Mirrokni embarked on this journey from a childhood notebook filled with problems, progressing through school classrooms and robotics competitions to prestigious global universities and major research labs. His diverse experiences reveal that scientific progress is not a linear path but a network of discoveries, experiments, and varied encounters, constantly shaped by interaction with real-world challenges.
His scientific journey from school classrooms to university and industrial research has always been intertwined with discovery and challenges. At Karaj’s School for the Gifted, he spent hours grappling with the toughest questions in high school. Olympiads and global RoboCup competitions were his first arenas of growth, where he learned that confidence and teamwork form the foundation of any great success. He still recalls the day his team won first place in Europe. For him, the experience of collaboration and self-belief was more valuable than the medals.
Entering university set his path apart from conventional education, leading him to Sharif University of Technology, where projects, programming competitions, and robotics deepened his passion for algorithms. These experiences introduced him to creative solutions for complex problems. This approach later became a consistent pattern in his research: breaking down problems into smaller parts, analyzing them meticulously, and reconstructing them in novel ways. It was at this university that he realized that his future lay in theoretical computer science—a choice that, brought him to MIT in Boston in 2005. Surrounded by brilliant minds, he immersed himself in the world of theoretical computer science, an environment that not only taught him to think more deeply but also showed him that science is most valuable when it connects to real life.
A Path To Global Innovation
After graduating from MIT, Mirrokni worked at Amazon and Microsoft Research where he transformed theoretical algorithms into solutions that impact millions of users daily. However, his true destination was Google Research, where he has been active in large-scale projects for more than a decade. Here, he works with interconnected data on a scale that is sometimes as vast as the global population. This experience reminds him that science is meaningful only when it extracts practical solutions from theoretical foundations. He currently leads algorithm research groups in New York, with projects spanning large-scale market algorithms, optimization, graph mining, and next-generation AI initiatives such as Gemini AI.
The world of artificial intelligence remains a constant adventure for him. Each month brings new models and methods that surpass the imagination of yesterday. What amazes him the most is the ability of systems to learn and improve autonomously—a phenomenon that has propelled progress beyond any linear trajectory. A future once predictable over years now transforms in mere months. He sees this uncertainty not as a threat but as a rare opportunity: a chance to create tools and ideas that more deeply intertwine human life with science.
Science, The Product of Collective Effort
Mirrokni consistently emphasizes that no success is truly meaningful unless it is shared with others. He believes that achievements are not solely the result of individual effort but stem from collaboration, trust, and research teams’ collective wisdom. This philosophy is evident throughout his scientific journey, and the Mustafa(pbuh) Prize in 2025, awarded for his work on locality-sensitive hashing based on p-stable distributions, stands as a testament to this perspective. Among his other accolades are the Best Paper Award at the ACM Conference on Electronic Commerce in 2008, the Best Student Paper Award at the ACM-SIAM Symposium in 2005, and the gold medal at Iran’s National Informatics Olympiad in 1996. His family, friends, colleagues, and research teams have all played a significant roles in his successes, and these achievements are the result of mutual trust and collaboration. For this reason, he has released many algorithms and libraries related to graph neural networks and data mining as open-source, enabling others to build upon them and advance scientific progress. For him, science is always a collective endeavor, and no achievement is complete without others’ contributions.
A Life Beyond Algorithms
Mirrokni’s life is not confined to equations and algorithms. From his teenage years playing soccer and ping-pong, where he learned the joy of friendly competition and teamwork, to his present day, when he cherishes the small joys of playing with his children, he has always viewed the balance between science, family, and community as the key to true growth. Spending time with his kids and learning from them mutually is one of the most precious parts of his life, bringing a sense of fulfillment and joy that no scientific milestone can rival.
Alongside his work at Google Research, Mirrokni serves as a visiting professor at the Courant Institute of New York University, teaching algorithms and internet economics. He advises the younger generation: “Now is the best time to dive into research. The rapid advancements in artificial intelligence have created a unique opportunity to turn your dreams into reality faster than ever before. But don’t forget—if you delegate everything to AI, your mind will miss the chance to grow and evolve.” He envisions a future where humans and AI collaborate to solve complex mathematical problems, with algorithms enhancing daily life in fields like medicine, social sciences.
Mirrokni’s story demonstrates that when individual curiosity and effort are intertwined with collaboration and innovation, they can move the world forward. His contributions to the development of algorithms and scientific methods not only advance knowledge but also enable practical applications in future projects and research, paving the way for the next generation of scientific development.
 

About the Work‌

In Search of Similarity

Have you ever finished a book and immediately felt like you just lost a friend? A book that spoke to you not just through its content, but through its atmosphere, its prose, and something intangible woven between its lines. Now imagine you’re searching for another book that can rekindle that same feeling. You step into a vast library with shelves arranged in disarray. Novels, books of philosophy, science, history—all jumbled together without clear categories. You begin flipping through the books, hoping to find a familiar spark. As time passes, exhaustion sets in. The books are countless, and what you’re seeking isn’t easily found. Finally, you sit down behind one of the library’s computers. You write a description of that beloved book, and now this human desire becomes a machine’s task. In the world of computers, the challenge grows more complex. This search engine must sift through billions of books to find one that matches your request. How does a computer, navigating a sea of data, identify items that are close in meaning or structure? More importantly, how can it do this quickly and accurately without examining every piece of data one by one? The answer lies in a method that, instead of relying on feelings, uses the language of numbers and formulas to understand similarities—an algorithm based on p-stable distributions, developed by researchers like Vahab Mirrokni, enabling computers to intelligently and rapidly identify similar data without scouring the entire digital landscape.
Similarity in the Language of Numbers
At first glance, similarity might seem like a simple concept. But when we step into the world of data, this simple idea takes on a more precise and distinct form. For computers, everything is just a sequence of numbers. A photo is a list of numbers representing pixels, and a recorded sound is a series of digits capturing frequency fluctuations. In a world where everything is reduced to numbers, similarity must also be defined in terms of numbers. In such a space, to determine how alike two things are, we need to measure how far apart they are. In the logic of machines, the smaller the distance between two sets of data, the less they differ. This is why the concept of distance becomes our primary tool for assessing similarity. However, measuring this distance is itself a significant challenge, as there are various ways to calculate it. To measure this closeness, a method called the LPnorm is used. This method relies on a general formula where changing a number called p alters our perspective on distance. For example, imagine you’ve drawn two points on a piece of paper and want to measure the distance between them. If you place a ruler to draw a straight line between the points, you’re measuring the shortest possible path. This is the case when p equals 2, known in mathematics as the Euclidean distance. Now imagine you can only move vertically or horizontally to get from one point to another. In this case, the distance is calculated by adding the horizontal and vertical movements. This is what happens when p equals 1, known as the Manhattan distance. Essentially, the value of p determines what kind of differences between data the system prioritizes.
The Computer’s Ruler
Now, let’s bring this concept of distance into the digital realm, where data is no longer images, sounds, or sentences but vectors of numbers. As mentioned earlier, computers measure similarity between two images or texts by calculating the distance between their vectors. For instance, when a search engine determines whether two phrases refer to the same topic, or when a music app suggests similar songs, what’s happening behind the scenes is a comparison of vectors. Depending on whether the algorithm prioritizes high accuracy or greater speed, different values of p can be used. If we want to focus on fine, precise differences, p=1 is a good choice, as it weighs all differences equally. However, for a broader perspective, p=2 is more suitable, allowing the computer to estimate distances between vectors more quickly. Importantly, for all values of p≥1, the LP distance is a valid metric, preserving mathematical properties like the triangle inequality. But if p<1 is used, while the formula can still be applied, the result is no longer a true metric, as it violates the triangle inequality. This case is mostly relevant in theoretical discussions or specific applications. In data science and machine learning, p≥1 is typically used because it’s more intuitive and mathematically robust, upholding properties like the triangle inequality. Nonetheless, innovative research, such as that by Mirrokni, has made effective use of p<1, enabling computers to detect differences better and faster than ever before.
A Shortcut in the City of Data
No matter how effective our method for measuring similarity between data points is, we still face a major challenge: the realm of data is boundless. Millions of images, texts, sounds, and videos are stored in computers, and comparing each one individually to find a specific file would take an enormous amount of time. This is where a clever algorithm, developed through the efforts of researchers like Mirrokni, comes into play: Locality-Sensitive Hashing, or LSH. This method enables rapid data categorization by grouping similar items together. But how is this possible with such vast amounts of information? LSH employs a clever trick. Instead of directly comparing long vectors, it uses specialized mathematical functions called hash functions to transform them into shorter, summarized vectors that still retain essential information. It’s like having a smart summary of a book that captures its essence without needing to read every page. To preserve the approximate distance between these summarized vectors, LSH relies on a tool called p-stable distributions. These distributions provide random vectors of numbers, which, through a series of algebraic operations with the original vector, produce a condensed version of the data. The magic of these distributions lies in their ability to make the distances between the outputs a good approximation of the distances between the original data. This means we can determine which data points are closer to each other using a short vector for each piece of data, without handling the entire dataset. Another key aspect is that the type of p-stable distribution used depends on the distance being measured. For example, if we’re calculating Euclidean distance, the random vectors must come from a p-stable distribution known as the Gaussian distribution, which is suited for p=2. For other values of p, specific distributions are used. This approach allows similar data to be categorized quickly without exhaustive searches.
This innovative method eliminates the need to completely reformat data or force it into complex frameworks. Its simplicity is what leads to astonishing speed. In some experiments, LSH has performed up to 40 times faster than traditional methods like kd-tree, even in challenging scenarios where p is less than 1. This intelligent summarization transforms LSH into a seasoned assistant for recognizing differences. In the world of numbers and vectors, there may be no emotions involved, but similarity can be detected at remarkable speed.