加载您的数据

在 scikit-network 中，图由其邻接矩阵（或二部图的二部邻接矩阵）表示，该矩阵采用 SciPy 的压缩稀疏行格式。

在本教程中，我们将介绍几种在该格式中实例化图的方法。

[1]:

from IPython.display import SVG

import numpy as np
from scipy import sparse
import pandas as pd

from sknetwork.data import from_edge_list, from_adjacency_list, from_graphml, from_csv
from sknetwork.visualization import visualize_graph, visualize_bigraph

从 NumPy 数组

对于小型图，您可以将邻接矩阵实例化为密集的 NumPy 数组，并将其转换为 CSR 格式的稀疏矩阵。

[2]:

adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 0]])
adjacency = sparse.csr_matrix(adjacency)

image = visualize_graph(adjacency)
SVG(image)

[2]:

../../_images/tutorials_data_load_data_4_0.svg

从边列表

另一种构建图的自然方式是从边列表构建。

[3]:

edge_list = [(0, 1), (1, 2), (2, 3), (3, 0), (0, 2)]
adjacency = from_edge_list(edge_list)

image = visualize_graph(adjacency)
SVG(image)

[3]:

../../_images/tutorials_data_load_data_6_0.svg

默认情况下，图是无向的，但您可以轻松地将其设为有向。

[4]:

adjacency = from_edge_list(edge_list, directed=True)

image = visualize_graph(adjacency)
SVG(image)

[4]:

../../_images/tutorials_data_load_data_8_0.svg

您可能还想为边添加权重。只需使用三元组而不是对即可！

[5]:

edge_list = [(0, 1, 1), (1, 2, 0.5), (2, 3, 1), (3, 0, 0.5), (0, 2, 2)]
adjacency = from_edge_list(edge_list)

image = visualize_graph(adjacency)
SVG(image)

[5]:

../../_images/tutorials_data_load_data_10_0.svg

您也可以实例化一个二部图。

[6]:

edge_list = [(0, 0), (1, 0), (1, 1), (2, 1)]
biadjacency = from_edge_list(edge_list, bipartite=True)

image = visualize_bigraph(biadjacency)
SVG(image)

[6]:

../../_images/tutorials_data_load_data_12_0.svg

如果节点没有索引，您将获得一个 Bunch 类型的对象，其中包含图属性（节点名称）。

[7]:

edge_list = [("Alice", "Bob"), ("Bob", "Carey"), ("Alice", "David"), ("Carey", "David"), ("Bob", "David")]
graph = from_edge_list(edge_list)

[8]:

graph

[8]:

{'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5'),
 'adjacency': <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 10 stored elements in Compressed Sparse Row format>}

[9]:

adjacency = graph.adjacency
names = graph.names

[10]:

image = visualize_graph(adjacency, names=names)
SVG(image)

[10]:

../../_images/tutorials_data_load_data_17_0.svg

默认情况下，每条边的权重是对应链接出现的次数。

[11]:

edge_list_new = edge_list + [("Alice", "Bob"), ("Alice", "David"), ("Alice", "Bob")]
graph = from_edge_list(edge_list_new)

[12]:

adjacency = graph.adjacency
names = graph.names

[13]:

image = visualize_graph(adjacency, names=names)
SVG(image)

[13]:

../../_images/tutorials_data_load_data_21_0.svg

您可以使图无权。

[14]:

graph = from_edge_list(edge_list_new, weighted=False)

[15]:

adjacency = graph.adjacency
names = graph.names

[16]:

image = visualize_graph(adjacency, names=names)
SVG(image)

[16]:

../../_images/tutorials_data_load_data_25_0.svg

同样，您可以使图有向。

[17]:

graph = from_edge_list(edge_list, directed=True)

[18]:

graph

[18]:

{'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5'),
 'adjacency': <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 5 stored elements in Compressed Sparse Row format>}

[19]:

adjacency = graph.adjacency
names = graph.names

[20]:

image = visualize_graph(adjacency, names=names)
SVG(image)

[20]:

../../_images/tutorials_data_load_data_30_0.svg

图也可以具有显式权重。

[21]:

edge_list = [("Alice", "Bob", 3), ("Bob", "Carey", 2), ("Alice", "David", 1), ("Carey", "David", 2), ("Bob", "David", 3)]
graph = from_edge_list(edge_list)

[22]:

adjacency = graph.adjacency
names = graph.names

[23]:

image = visualize_graph(adjacency, names=names, display_edge_weight=True, display_node_weight=True)
SVG(image)

[23]:

../../_images/tutorials_data_load_data_34_0.svg

对于二部图。

[24]:

edge_list = [("Alice", "Football"), ("Bob", "Tennis"), ("David", "Football"), ("Carey", "Tennis"), ("Carey", "Football")]
graph = from_edge_list(edge_list, bipartite=True)

[25]:

biadjacency = graph.biadjacency
names = graph.names
names_col = graph.names_col

[26]:

image = visualize_bigraph(biadjacency, names_row=names, names_col=names_col)
SVG(image)

[26]:

../../_images/tutorials_data_load_data_38_0.svg

从邻接列表

您还可以从邻接列表加载图，该列表以列表列表或字典列表的形式给出。

[27]:

adjacency_list =[[0, 1, 2], [2, 3]]
adjacency = from_adjacency_list(adjacency_list, directed=True)

[28]:

image = visualize_graph(adjacency)
SVG(image)

[28]:

../../_images/tutorials_data_load_data_41_0.svg

[29]:

adjacency_dict = {"Alice": ["Bob", "David"], "Bob": ["Carey", "David"]}
graph = from_adjacency_list(adjacency_dict, directed=True)

[30]:

adjacency = graph.adjacency
names = graph.names

[31]:

image = visualize_graph(adjacency, names=names)
SVG(image)

[31]:

../../_images/tutorials_data_load_data_44_0.svg

从数据框

您的数据框可能包含边列表。

[32]:

df = pd.read_csv('miserables.tsv', sep='\t', names=['character_1', 'character_2'])

[33]:

df.head()

[33]:

	character_1	character_2
0	Myriel	Napoleon
1	Myriel	Mlle Baptistine
2	Myriel	Mme Magloire
3	Myriel	Countess de Lo
4	Myriel	Geborand

[34]:

edge_list = list(df.itertuples(index=False))

[35]:

graph = from_edge_list(edge_list)

[36]:

graph

[36]:

{'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
        'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
        'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
        'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
        'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
        'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
        'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
        'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
        'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
        'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
        'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
        'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
        'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
        'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
        'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
        'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17'),
 'adjacency': <77x77 sparse matrix of type '<class 'numpy.int64'>'
        with 508 stored elements in Compressed Sparse Row format>}

[37]:

df = pd.read_csv('movie_actor.tsv', sep='\t', names=['movie', 'actor'])

[38]:

df.head()

[38]:

	movie	actor
0	Inception	Leonardo DiCaprio
1	Inception	Marion Cotillard
2	Inception	Joseph Gordon Lewitt
3	The Dark Knight Rises	Marion Cotillard
4	The Dark Knight Rises	Joseph Gordon Lewitt

[39]:

edge_list = list(df.itertuples(index=False))

[40]:

graph = from_edge_list(edge_list, bipartite=True)

[41]:

graph

[41]:

{'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
        'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
        'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
        'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
        'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
       dtype='<U28'),
 'biadjacency': <15x16 sparse matrix of type '<class 'numpy.int64'>'
        with 41 stored elements in Compressed Sparse Row format>}

对于分类数据，您可以使用 pandas 获取样本和特征之间的二部图。我们展示了从成人收入数据集获得的一个示例。

[42]:

df = pd.read_csv('adult-income.csv')

[43]:

df.head()

[43]:

	age	workclass	occupation	relationship	gender	income
0	40-49	State-gov	Adm-clerical	Not-in-family	Male	<=50K
1	50-59	Self-emp-not-inc	Exec-managerial	Husband	Male	<=50K
2	40-49	Private	Handlers-cleaners	Not-in-family	Male	<=50K
3	50-59	Private	Handlers-cleaners	Husband	Male	<=50K
4	30-39	Private	Prof-specialty	Wife	Female	<=50K

[44]:

df_binary = pd.get_dummies(df, sparse=True)

[45]:

df_binary.head()

[45]:

	age_20-29	age_30-39	age_40-49	age_50-59	age_60-69	age_70-79	age_80-89	age_90-99	workclass_ ?	workclass_ Federal-gov	...	relationship_ Husband	relationship_ Not-in-family	relationship_ Other-relative	relationship_ Own-child	relationship_ Unmarried	relationship_ Wife	gender_ Female	gender_ Male	income_ <=50K	income_ >50K
0	False	False	True	False	False	False	False	False	False	False	...	False	True	False	False	False	False	False	True	True	False
1	False	False	False	True	False	False	False	False	False	False	...	True	False	False	False	False	False	False	True	True	False
2	False	False	True	False	False	False	False	False	False	False	...	False	True	False	False	False	False	False	True	True	False
3	False	False	False	True	False	False	False	False	False	False	...	True	False	False	False	False	False	False	True	True	False
4	False	True	False	False	False	False	False	False	False	False	...	False	False	False	False	False	True	True	False	True	False

5 rows × 42 columns

[46]:

biadjacency = df_binary.sparse.to_coo()

[47]:

biadjacency = sparse.csr_matrix(biadjacency)

[48]:

# biadjacency matrix of the bipartite graph
biadjacency

[48]:

<32561x42 sparse matrix of type '<class 'numpy.bool_'>'
        with 195366 stored elements in Compressed Sparse Row format>

[49]:

# names of columns
names_col = list(df_binary)

[50]:

len(names_col)

[50]:

[51]:

names_col[:8]

[51]:

['age_20-29',
 'age_30-39',
 'age_40-49',
 'age_50-59',
 'age_60-69',
 'age_70-79',
 'age_80-89',
 'age_90-99']

从 CSV 文件

您可以直接从 CSV 或 TSV 文件加载图。

[52]:

graph = from_csv('miserables.tsv')

[53]:

graph

[53]:

{'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
        'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
        'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
        'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
        'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
        'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
        'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
        'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
        'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
        'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
        'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
        'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
        'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
        'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
        'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
        'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17'),
 'adjacency': <77x77 sparse matrix of type '<class 'numpy.int64'>'
        with 508 stored elements in Compressed Sparse Row format>}

[54]:

graph = from_csv('movie_actor.tsv', bipartite=True)

[55]:

graph

[55]:

{'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
        'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
        'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
        'The Big Short', 'The Dark Knight Rises',
        'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
       dtype='<U28'),
 'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
        'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
        'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
        'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
        'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
       dtype='<U28'),
 'biadjacency': <15x16 sparse matrix of type '<class 'numpy.int64'>'
        with 41 stored elements in Compressed Sparse Row format>}

该图也可以以邻接列表的形式给出（检查函数 from_csv）。

从 GraphML 文件

您还可以加载存储在 GraphML 格式中的图。

[56]:

graph = from_graphml('miserables.graphml')
adjacency = graph.adjacency
names = graph.names

[57]:

# Directed graph
graph = from_graphml('painters.graphml')
adjacency = graph.adjacency
names = graph.names

从 NetworkX

NetworkX 具有从 CSR 格式导入和导出函数。

其他选项

您想测试我们的玩具图。
您想从模型生成图。
您想从现有存储库加载图（请参阅 NetSet 和 KONECT）。

查看数据部分的其他教程！