# 从七桥问题开始：全面介绍图论及其应用

#### 七桥问题

Leonhard Euler

• 有限无向图 G(V,E) 的 Euler 路径是指 G 的每一个边都只出现一次的路径。如果 G 有一条 Euler 路径，它就被称之 Euler 图。[注释 1]

• 定理：有且仅有两个确定的节点存在奇数自由度，其它的节点都有偶数自由度，那么该有限无向图为 Euler 图。【1】

#### 图表征：前言

```struct BinTreeNode
{
T value; // don't bother with template<>
TreeNode* left;
TreeNode* right;
};
class BinTree
{
public:
// insert, remove, find, bla bla
private:
BinTreeNode* root_;
};```

```BinTreeNode<Apple>* root = new BinTreeNode<Apple>("Green");

root->left = new BinTreeNode<Apple>("Yellow");
root->right = new BinTreeNode<Apple>("Yellow 2");

BinTreeNode<Apple>* yellow_2 = root->right;

yellow_2->left = new BinTreeNode<Apple>("Almost red");
yellow_2->right = new BinTreeNode<Apple>("Red");```

#### Airbnb

Airbnb 房源搜索一瞥：

```// feel free to reorganize this struct to avoid redundant space
// usage because of aligning factor
// Remark 1: some of the properties could be expressed as enums,
// bitset is chosen for as multi-value enum holder.
// Remark 2: for most of the count values the maximum is 16
// Remark 3: price value considered as integer,
// int considered as 4 byte.
// Remark 4: neighborhoods property omitted
// Remark 5: to avoid spatial queries, we're
// using only country code and city name, at this point won't consider
// the actual coordinates (latitude and longitude)
struct AirbnbHome
{
wstring name; // wide string
uint price;
uchar rating;
uint rating_count;
vector<string> photos; // list of photo URLs
string host_id;
uchar children_number; // max is 5
uchar infants_number; // max is 5
bitset<3> home_type;
uchar beds_number;
uchar bedrooms_number;
uchar bathrooms_number;
bitset<21> accessibility;
bool superhost;
bitset<20> amenities;
bitset<6> facilities;
bitset<34> property_types;
bitset<32> host_languages;
bitset<3> house_rules;
ushort country_code;
string city;
};```

```bool HasWasher(AirbnbHome* h)
{
return h->amenities[2];
}```

```const int KITCHEN = 0;
const int HEATING = 1;
const int WASHER = 2;
//...
bool HasWasher(AirbnbHome* h)
{
return (h != nullptr) && h->amenities[WASHER];
}

bool HasWasherAndKitchen(AirbnbHome* h)
{
return (h != nullptr) && h->amenities[WASHER] && h->amenities[KITCHEN];
}

bool HasAllAmenities(AirbnbHome* h, const std::vector<int>& amenities)
{
bool has = (h != nullptr);
for (const auto a : amenities) {
has &= h->amenities[a];
}
return has;
}```

`// Note the comments struct AirbnbHome {  wstring name; // 200 bytes  uint price; // 4 bytes  uchar rating; // 1 byte  uint rating_count; // 4 bytes  vector<string> photos; // 100 bytes  string host_id; // 20 bytes  uchar adults_number; // 1 byte  uchar children_number; // 1 byte  uchar infants_number; // 1 byte  bitset<3> home_type; // 4 bytes  uchar beds_number; // 1 byte  uchar bedrooms_number; // 1 byte  uchar bathrooms_number; // 1 byte  bitset<21> accessibility; // 4 bytes  bool superhost; // 1 byte  bitset<20> amenities; // 4 bytes  bitset<6> facilities; // 4 bytes  bitset<34> property_types; // 8 bytes  bitset<32> host_languages; // 4 bytes, correct me if I'm wrong  bitset<3> house_rules; // 4 bytes  ushort country_code; // 2 bytes  string city; // 50 bytes };`

（1）目标节点被发现；

（2）如果要找的值小于节点的值，则匹配将继续到节点的左子树；

（3）如果要找的值大于节点的值，则匹配将继续到节点的右子树。

#### 图表征：进阶

N-ary 树更像是模拟一个图。

```struct NTreeNode
{
T value;
vector<NTreeNode*> children;
};```

```// almost pseudocode
class NTree
{
public:
void Insert(const T&);
void Remove(const T&);
// lines of omitted code
private:
NTreeNode* root_;
};```

```struct GraphNode
{
T value;
};```

```class ConnectedGraph
{
public:
// API

private:
GraphNode* root_;
};```

```class DisconnectedGraphOrJustAGraph
{
public:
// API

private:
std::vector<GraphNode*> all_roots_;
};```

```// A representation of a graph with both vertex and edge tables
// Vertex table is a hashtable of edges (mapped by label)
// Edge table is a structure with 4 fields
// VELO = Vertex Edge Label Only (e.g. no vertex payloads)

class ConnectedVELOGraph {
public:
struct Edge {
Edge(const std::string& f, const std::string& t)
: from(f)
, to(t)
, used(false)
, next(nullptr)
{}
std::string ToString() {
return (from + " - " + to + " [used:" + (used ? "true" : "false") + "]");
}

std::string from;
std::string to;
bool used;
Edge* next;
};

ConnectedVELOGraph() {}
~ConnectedVELOGraph() {
vertices_.clear();
for (std::size_t ix = 0; ix < edges_.size(); ++ix) {
delete edges_[ix];
}
}

public:
void InsertEdge(const std::string& from, const std::string& to) {
Edge* e = new Edge(from, to);
InsertVertexEdge_(from, e);
InsertVertexEdge_(to, e);
edges_.push_back(e);
}

public:
void Print() {
for (auto elem : edges_) {
std::cout << elem->ToString() << std::endl;
}
}

std::vector<std::string> Trace(const std::string& v) {
std::vector<std::string> path;
Edge* e = vertices_[v];
while (e != nullptr) {
if (e->used) {
e = e->next;
} else {
e->used = true;
path.push_back(e->from + ":-:" + e->to);
e = vertices_[e->to];
}
}
return path;
}

private:
void InsertVertexEdge_(const std::string& label, Edge* e) {
if (vertices_.count(label) == 0) {
vertices_[label] = e;
} else {
vertices_[label]->next = e;
}
}

private:
std::unordered_map<std::string, Edge*> vertices_;
std::vector<Edge*> edges_;
};```

`// 'author' represents the User object, at this point we are interested only in author.id // // 'tw' is a Tweet object, at this point we are interested only in 'tw.id' void DeliverATweet(User* author, Tweet* tw) {  // we assume that 'tw' object is already stored in a database  // 1. Get the list of user's followers (author of the tweet)  vector<User*> user_followers = GetUserFollowers(author->id);  // 2. insert tweet into each timeline  for (auto follower : user_followers) {    InsertTweetIntoUserTimeline(follower->id, tw->id);  } }`

`// Warning: a bunch of pseudocode ahead void RangeInsertIntoTimelines(vector<long> user_ids, long tweet_id) {  for (auto id : user_ids) {    InsertIntoUserTimeline(id, tweet_id);  } } void DeliverATweet(User* author, Tweet* tw) {  // we assume that 'tw' object is already stored in a database  // 1. Get the list of user's (tweet author's) followers's ids  vector<long> user_followers = GetUserFollowers(author->id);  // 2. Insert tweet into each timeline in parallel  const int CHUNK_SIZE = 4000; // saw this somewhere  for (each CHUNK_SIZE elements in user_followers) {    Thread t = ThreadPool.GetAvailableThread(); // somehow    t.Run(RangeInsertIntoTimelines, current_chunk, tw->id);  } }`

Airbnb 过滤器节选

```house_type: "entire_place",
price_range_start: 56,
price_range_end: 80,
beds_number: 2,
amenities: ["tv", "wifi", "laptop friendly workspace"],
facilities: ["gym"]```

#### 图算法

• Coloring

• Hopcroft–Karp

• Hungarian

• Prüfer coding

• Tarjan's off-line least common ancestors

• Topological sort

Hopcroft-Karp 算法（以及很多其它算法）都基于 DFS（深度优先搜索）和 BFS（广度优先搜索）的图遍历算法。说实话，这里介绍 Hopcroft-Karp 算法的真正原因是逐渐转换到图遍历算法的讨论，相比从二值树开始讨论会更好。

```// struct ListNode {
//   ListNode* next;
//   T item;
// };

void TraverseRecursive(ListNode* node) // starting node, most commonly the list 'head'
{
if (!node) return; // stop
std::cout << node->item;
TraverseRecursive(node->next); // recursive call
}

void TraverseIterative(ListNode* node)
{
while (node) {
std::cout << node->item;
node = node->next;
}
}```

• 输出节点值，然后到达左节点，再到达右节点。

• 到达左节点，然后输出节点值，再到达右节点。

• 到达左节点，然后到达右节点，再输出节点值。

`// struct TreeNode { //   T item; //   TreeNode* left; //   TreeNode* right; // } // you cann pass a callback function to do whatever you want to do with the node's value // in this particular example we are just printing its value. // node is the "starting point", basically the first call is done with the "root" node void PreOrderTraverse(TreeNode* node) {  if (!node) return; // stop  std::cout << node->item;  PreOrderTraverse(node->left); // do the same for the left sub-tree  PreOrderTraverse(node->right); // do the same for the right sub-tree } void InOrderTraverse(TreeNode* node) {  if (!node) return; // stop  InOrderTraverse(node->left);  std::cout << node->item;  InOrderTraverse(node->right); } void PostOrderTraverse(TreeNode* node) {  if (!node) return; // stop  PostOrderTraverse(node->left);  PostOrderTraverse(node->right);  std::cout << node->item; }`

#### 实例：Netflix

「KISK」或「让我们保持它的简单」或「为了简单起见」，这是教程编写者从真实问题中抽象出来的超级借口，并通过在伪代码中引入「abc」简单示例及其解决方案来做出大量假设，并且这些答案在很老的笔记本电脑上也能工作。

Netflix 和亚马逊。Netflix 提供电影服务，亚马逊提供产品，我们会将它们命名为「物品」，所以每当你阅读「物品」时，都会想到 Netflix 中的电影或亚马逊的任何 [合格] 产品。这些物品最常用的是解析其标题和描述（我们只处理标题），所以如果一个操作员（通常是一个人通过管理仪表板将项目的数据插入 Netflix / Amazon 数据库）插入新项目到数据库中，它的标题正在被一些「ItemTitleProcessor」处理以产生关键字。

`// Cached representation of an Item // Full Item object (with title, description, comments etc.) // could be fetched from the database struct Item {  // ID_TYPE is the type of Item's unique id, might be an integer, or a string  ID_TYPE id;  int rating; };`

```// this is a pseudocode, that's why I didn't bother with "const&"'s and "std::"'s
// though it could have look better, forgive me C++ fellows

vector<Item*> GetItemsByKeywordInSortedOrder(string keyword)
{
// assuming IndexTable is a big hashtable mapping keywords to Item BSTs
BST<Item*> items = IndexTable[keyword];

// suppose BST has a function InOrderProduceVector(), which creates a vector and
// inserts into it items fetched via in-order traversing the tree
vector<Item*> sorted_result = items.InOrderProduceVector();
return sorted_result;
}```

```template <typename BlaBla>
class BST
{
public:
// other code ...
vector<BlaBla*> InOrderProduceVector()
{
vector<BlaBla*> result;
result.reserve(1000); // magic number, reserving a space to avoid reallocation on inserts
InOrderProduceVectorHelper_(root_, result); // passing vector by reference
return result;
}

protected:
// takes a reference to vector
void InOrderProduceVectorHelper_(BSTNode* node, vector<BlaBla*>& destination)
{
if (!node) return;
InOrderProduceVectorHelper_(node->left, destination);
destination.push_back(node->item);
InOrderProduceVectorHelper_(node->right, destination);
}

private:
BSTNode* root_;
};```

```// Reminder: this is pseudocode, no bother with "const&", "std::" or others
// forgive me C++ fellows

template <typename BlaBla>
class BST
{
public:
// other code ...

vector<BlaBla*> ReverseInOrderProduceVector(int offset, int limit)
{
vector<BlaBla*> result;
result.reserve(limit);
// passing result vector by reference
// and passing offset and limit
ReverseInOrderProduceVectorHelper_(root_, result, offset, limit);
return result;
}

protected:
// takes a reference to vector
// skips 'offset' nodes and inserts up to 'limit' nodes
void ReverseInOrderProduceVectorHelper_(BSTNode* node, vector<BlaBla*>& destination, int offset, int limit)
{
if (!node) return;
if (limit == 0) return;
--offset; // skipping current element
ReverseInOrderProduceVectorHelper_(node->right, destination, offset, limit);
if (offset <= 0) { // if skipped enough, insert
destination.push_back(node->value);
--limit; // keep the count of insertions
}
ReverseInOrderProduceVectorHelper_(node->left, destination, offset, limit);
}

private:
BSTNode* root_;
};

// ... other possibly useful code

// this is a pseudocode, that's why I didn't bother with "const&"'s and "std::"'s
// though it could have look better, forgive me C++ fellows

vector<Item*> GetItemsByKeywordInSortedOrder(string keyword, offset, limit) // pagination using offset and limit
{
// assuming IndexTable is a big hashtable mapping keywords to Item BSTs
BST<Item*> items = IndexTable[keyword];

// suppose BST has a function InOrderProduceVector(), which creates a vector and
// inserts into it items fetched via reverse in-order traversing the tree
// to get items in descending order (starting from the highest rated item)
vector<Item*> sorted_result = items.ReverseInOrderProduceVector(offset, limit);
return sorted_result;
}```

#### DFS vs. BFS

• 深度优先搜索——更多动作，更少思考。

• 广度优先搜索——在进一步行动之前先仔细观察四周。

DFS 很像前序/顺序/后序遍历，而 BFS 用于分层输出树节点。我们需要一个队列（数据结构）来存储图的「层级」，同时输出（访问）其「父级」。在之前的插图中，节点是队列中浅蓝色的点。每一层的节点被从队列中取走，同时在访问每个被取走的节点时，我们还应该将其子节点插入队列（为下一层做准备）。下列代码很简单，可以帮助大家了解 BFS。代码假设图是连通的，尽管我们可以修改代码，使其应用于非连通图。

`// Assuming graph is connected // and a graph node is defined by this structure // struct GraphNode { //   T item; //   vector<GraphNode*> children; // } // WARNING: untested code void BreadthFirstSearch(GraphNode* node) // start node {  if (!node) return;  queue<GraphNode*> q;  q.push(node);  while (!q.empty()) {    GraphNode* cur = q.front(); // doesn't pop    q.pop();    for (auto child : cur->children) {      q.push(child);    }    // do what you want with current node    cout << cur->item;  } }`

#### 示例：Uber

Uber 有 5000 万用户、700 万司机，对于 Uber 来说，最重要的事情之一就是高效匹配司机和乘客。该问题首先是定位的问题。后端要处理数百万份用户请求，将每份请求发送至一或多（通常是多）位附近的司机。尽管将用户请求发送至所有附近司机更加简单，有时也更加智能，但是预处理通常会有所帮助。

1. 标记所有未访问节点。创建所有未访问节点的集合「unvisited set」。

2. 为每个节点分配一个实验距离值：初始节点的距离值设置为 0，其他节点设置为无穷大。将初始节点设置为当前节点。

3. 对于当前节点，考虑其所有未访问近邻，通过当前节点计算它们的实验距离。对比新计算的实验距离和当前分配的值，分配较小的值。例如，如果当前节点 A 的距离是 6，连接 A 与近邻 B 的边长度为 2，则经过 A 到 B 的距离是 6 + 2 = 8。如果 B 之前标记的距离大于 8，则将其更改为 8。反之，保留当前值。

4. 当我们考虑完当前节点的所有近邻之后，将当前节点标记为已访问，并从 unvisited set 中移除。已访问节点无需再检查。

5. 如果目标节点已经标记为已访问（当规划两个特定节点之间路线的时候），或 unvisited set 中节点之间的最小实验距离是无穷大（当规划完整遍历时，初始节点和其余未访问节点之间没有连接时），则停止，算法结束。

6. 反之，选择标记有最小实验距离的未访问节点，将其设置为新的「当前节点」，并返回第 3 步。