Matching up users with specific needs to suppliers who can meet those needs is a common theme in the plethora of apps and services that are budding in today’s “sharing economy”. In this post, I’ll explore a simple mathematical concept underlying several different real world scenarios.
Imagine you run a dating company. You have a group of 50 men and 50 women, and your job is to pair up men and women with one another to go off on a blind date. You’ve done your due diligence, and have come up with a way of measuring how good of a match a chosen man and woman might be. Perhaps you’ve obtained information about each participants interests, preferences etc. In any case, you can assign a score to each possible pairing of man and woman, so in this case you have a list of 502 = 2500 scores. How do we efficiently pick the best matching? We want to maximize the chance that the customers have a good time during their dates, after all.
Now imagine you run Uber. Seems daunting. Let’s say you run the logistics department at Uber, because you know next to nothing about running a business. Here’s a problem you might encounter: In a highly populated area like NYC, in a given span of 20 seconds during rush hour, Uber might receive 100 requests from costumers to call a cab. Let’s say there are 150 drivers available, each a certain amount of time away from each customer. For each user, we can assign a score to each potential pairing of a customer with a driver. The sooner the driver could arrive at the customer, the better the score of the pairing. We don’t want the customers waiting too long, so how can we efficiently allocate drivers to customers?
One more scenario. I promise this will be the last one. Imagine you are hosting an educational service that attempts to pair up people with complementary skills so that they can both gain from the others experience. Suppose each person has a list of their current qualifications, and a list of skills they would like to learn. If Person A is a skilled programmer, and would like to learn about optimization analysis, and Person B is an optimization analyst and would like to learn how to program, then these two people could probably learn a lot from one another! So give a large group of people, each with a set of current skills and desired skills, how might we pair these people off in an optimal way? There are various notions of what an “optimal” pairing might be, but let’s say we use the number of complementary skills that the given pair has to determine its quality.
These scenarios all turn out to have identical mathematical descriptions, despite their varied settings. The method for solving the problems presents itself by translating everything into the language of graph theory.
The notion of a graph is a very fundamental mathematical concept. For our purposes a graph consists of a set of vertices, and a set of edges connecting some pairs of distinct vertices. Graphs are ubiquitous throughout mathematics because they capture the notion of a set of objects (vertices) and some form of relationship between the objects (edges). Depending on the setting, the objects and relations that vertices and edges stand for can vary widely. One can also consider a graph as a mathematical object in its own right. Graph theory is the study of mathematical properties of graphs.
Once we have down the notion of a graph, we can consider a weighted graph, which is just a graph with a number attached to each edge. We’ll actually use weighted graphs when thinking about the earlier scenarios. We need one more mathematical concept to complete the translation of our problems into graph theoretic terms.
A matching of a graph G is a selection of edges of G such that each vertex occurs in at most one pairing.
Observe that different matchings can have a different number of total edges in it. This depends on the exact nature of the graph. Therefore, it makes mathematical sense to ask the question: Given a graph G, what is the matching with the largest number of edges? Can we find these maximal matchings algorithmically and efficiently? If our graph G is weighted, we can ask what is the matching which maximizes the sum of the weights? If all of the weights in G are equal, then the second question is equivalent to the first one. In our example above, the matching in the upper right maximizes the sum of the weights, while the lower matching maximizes the number of edges in the matching.
One way to solve this problem for a given graph is to simply try out all possible matchings and take the best one. However, if the graph is large, there is a computationally unfeasible number of pairings to iterate through. It turns out there is an elegant and efficient solution to this problem, called Edmond’s Blossom Algorithm, discovered by Jack Edmonds in 1967.
Very roughly speaking, Edmond’s algorithm works by first picking a matching at random. It then uses an ingenious method to find an improvement to the matching if one exists. It keeps repeating this process until no further improvement can be found.
I’ll go into some of the more technical details of the algorithm in a later post. For now let’s treat it as a black box that produces an optimal answer to the graph theoretic questions in the previous paragraph. Let’s go through our earlier scenarios and frame them in this new language.
In the first scenario, we construct a bipartite graph (which means the vertices can be separated into two groups such that within each group, no two vertices are connected by an edge). We add an edge for each potential matching, and we weight the edge by how good of a match we think they’ll make.
In the second scenario, we also construct a bipartite graph (the two groups of vertices are drivers and customers) and weight each edge by how long it would take the driver to arrive at the passenger.
In our last hypothetical scenario, each vertex represents a user, and two users share an edge if they have complementary skills. Each edge could be weighted by some measure of how good the matching is.
In each of these scenarios, the question we want to answer is to find the maximal matching of the associated weighted graph. Edmond’s algorithm gives us the solution in each case!