加载您的数据
在 scikit-network 中,图由其 邻接矩阵(或二部图的二部邻接矩阵)表示,该矩阵采用 SciPy 的 压缩稀疏行 格式。
在本教程中,我们将介绍几种在该格式中实例化图的方法。
[1]:
from IPython.display import SVG
import numpy as np
from scipy import sparse
import pandas as pd
from sknetwork.data import from_edge_list, from_adjacency_list, from_graphml, from_csv
from sknetwork.visualization import visualize_graph, visualize_bigraph
从 NumPy 数组
对于小型图,您可以将邻接矩阵实例化为密集的 NumPy 数组,并将其转换为 CSR 格式的稀疏矩阵。
[2]:
adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 0]])
adjacency = sparse.csr_matrix(adjacency)
image = visualize_graph(adjacency)
SVG(image)
[2]:
从边列表
另一种构建图的自然方式是从边列表构建。
[3]:
edge_list = [(0, 1), (1, 2), (2, 3), (3, 0), (0, 2)]
adjacency = from_edge_list(edge_list)
image = visualize_graph(adjacency)
SVG(image)
[3]:
默认情况下,图是无向的,但您可以轻松地将其设为有向。
[4]:
adjacency = from_edge_list(edge_list, directed=True)
image = visualize_graph(adjacency)
SVG(image)
[4]:
您可能还想为边添加权重。只需使用三元组而不是对即可!
[5]:
edge_list = [(0, 1, 1), (1, 2, 0.5), (2, 3, 1), (3, 0, 0.5), (0, 2, 2)]
adjacency = from_edge_list(edge_list)
image = visualize_graph(adjacency)
SVG(image)
[5]:
您也可以实例化一个二部图。
[6]:
edge_list = [(0, 0), (1, 0), (1, 1), (2, 1)]
biadjacency = from_edge_list(edge_list, bipartite=True)
image = visualize_bigraph(biadjacency)
SVG(image)
[6]:
如果节点没有索引,您将获得一个 Bunch
类型的对象,其中包含图属性(节点名称)。
[7]:
edge_list = [("Alice", "Bob"), ("Bob", "Carey"), ("Alice", "David"), ("Carey", "David"), ("Bob", "David")]
graph = from_edge_list(edge_list)
[8]:
graph
[8]:
{'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5'),
'adjacency': <4x4 sparse matrix of type '<class 'numpy.int64'>'
with 10 stored elements in Compressed Sparse Row format>}
[9]:
adjacency = graph.adjacency
names = graph.names
[10]:
image = visualize_graph(adjacency, names=names)
SVG(image)
[10]:
默认情况下,每条边的权重是对应链接出现的次数。
[11]:
edge_list_new = edge_list + [("Alice", "Bob"), ("Alice", "David"), ("Alice", "Bob")]
graph = from_edge_list(edge_list_new)
[12]:
adjacency = graph.adjacency
names = graph.names
[13]:
image = visualize_graph(adjacency, names=names)
SVG(image)
[13]:
您可以使图无权。
[14]:
graph = from_edge_list(edge_list_new, weighted=False)
[15]:
adjacency = graph.adjacency
names = graph.names
[16]:
image = visualize_graph(adjacency, names=names)
SVG(image)
[16]:
同样,您可以使图有向。
[17]:
graph = from_edge_list(edge_list, directed=True)
[18]:
graph
[18]:
{'names': array(['Alice', 'Bob', 'Carey', 'David'], dtype='<U5'),
'adjacency': <4x4 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>}
[19]:
adjacency = graph.adjacency
names = graph.names
[20]:
image = visualize_graph(adjacency, names=names)
SVG(image)
[20]:
图也可以具有显式权重。
[21]:
edge_list = [("Alice", "Bob", 3), ("Bob", "Carey", 2), ("Alice", "David", 1), ("Carey", "David", 2), ("Bob", "David", 3)]
graph = from_edge_list(edge_list)
[22]:
adjacency = graph.adjacency
names = graph.names
[23]:
image = visualize_graph(adjacency, names=names, display_edge_weight=True, display_node_weight=True)
SVG(image)
[23]:
对于二部图。
[24]:
edge_list = [("Alice", "Football"), ("Bob", "Tennis"), ("David", "Football"), ("Carey", "Tennis"), ("Carey", "Football")]
graph = from_edge_list(edge_list, bipartite=True)
[25]:
biadjacency = graph.biadjacency
names = graph.names
names_col = graph.names_col
[26]:
image = visualize_bigraph(biadjacency, names_row=names, names_col=names_col)
SVG(image)
[26]:
从邻接列表
您还可以从邻接列表加载图,该列表以列表列表或字典列表的形式给出。
[27]:
adjacency_list =[[0, 1, 2], [2, 3]]
adjacency = from_adjacency_list(adjacency_list, directed=True)
[28]:
image = visualize_graph(adjacency)
SVG(image)
[28]:
[29]:
adjacency_dict = {"Alice": ["Bob", "David"], "Bob": ["Carey", "David"]}
graph = from_adjacency_list(adjacency_dict, directed=True)
[30]:
adjacency = graph.adjacency
names = graph.names
[31]:
image = visualize_graph(adjacency, names=names)
SVG(image)
[31]:
从数据框
您的数据框可能包含边列表。
[32]:
df = pd.read_csv('miserables.tsv', sep='\t', names=['character_1', 'character_2'])
[33]:
df.head()
[33]:
character_1 | character_2 | |
---|---|---|
0 | Myriel | Napoleon |
1 | Myriel | Mlle Baptistine |
2 | Myriel | Mme Magloire |
3 | Myriel | Countess de Lo |
4 | Myriel | Geborand |
[34]:
edge_list = list(df.itertuples(index=False))
[35]:
graph = from_edge_list(edge_list)
[36]:
graph
[36]:
{'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17'),
'adjacency': <77x77 sparse matrix of type '<class 'numpy.int64'>'
with 508 stored elements in Compressed Sparse Row format>}
[37]:
df = pd.read_csv('movie_actor.tsv', sep='\t', names=['movie', 'actor'])
[38]:
df.head()
[38]:
movie | actor | |
---|---|---|
0 | Inception | Leonardo DiCaprio |
1 | Inception | Marion Cotillard |
2 | Inception | Joseph Gordon Lewitt |
3 | The Dark Knight Rises | Marion Cotillard |
4 | The Dark Knight Rises | Joseph Gordon Lewitt |
[39]:
edge_list = list(df.itertuples(index=False))
[40]:
graph = from_edge_list(edge_list, bipartite=True)
[41]:
graph
[41]:
{'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
dtype='<U28'),
'biadjacency': <15x16 sparse matrix of type '<class 'numpy.int64'>'
with 41 stored elements in Compressed Sparse Row format>}
对于分类数据,您可以使用 pandas
获取样本和特征之间的二部图。我们展示了从 成人收入 数据集获得的一个示例。
[42]:
df = pd.read_csv('adult-income.csv')
[43]:
df.head()
[43]:
age | workclass | occupation | relationship | gender | income | |
---|---|---|---|---|---|---|
0 | 40-49 | State-gov | Adm-clerical | Not-in-family | Male | <=50K |
1 | 50-59 | Self-emp-not-inc | Exec-managerial | Husband | Male | <=50K |
2 | 40-49 | Private | Handlers-cleaners | Not-in-family | Male | <=50K |
3 | 50-59 | Private | Handlers-cleaners | Husband | Male | <=50K |
4 | 30-39 | Private | Prof-specialty | Wife | Female | <=50K |
[44]:
df_binary = pd.get_dummies(df, sparse=True)
[45]:
df_binary.head()
[45]:
age_20-29 | age_30-39 | age_40-49 | age_50-59 | age_60-69 | age_70-79 | age_80-89 | age_90-99 | workclass_ ? | workclass_ Federal-gov | ... | relationship_ Husband | relationship_ Not-in-family | relationship_ Other-relative | relationship_ Own-child | relationship_ Unmarried | relationship_ Wife | gender_ Female | gender_ Male | income_ <=50K | income_ >50K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | False | True | False | False | False | False | False | False | False | ... | False | True | False | False | False | False | False | True | True | False |
1 | False | False | False | True | False | False | False | False | False | False | ... | True | False | False | False | False | False | False | True | True | False |
2 | False | False | True | False | False | False | False | False | False | False | ... | False | True | False | False | False | False | False | True | True | False |
3 | False | False | False | True | False | False | False | False | False | False | ... | True | False | False | False | False | False | False | True | True | False |
4 | False | True | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | True | True | False | True | False |
5 rows × 42 columns
[46]:
biadjacency = df_binary.sparse.to_coo()
[47]:
biadjacency = sparse.csr_matrix(biadjacency)
[48]:
# biadjacency matrix of the bipartite graph
biadjacency
[48]:
<32561x42 sparse matrix of type '<class 'numpy.bool_'>'
with 195366 stored elements in Compressed Sparse Row format>
[49]:
# names of columns
names_col = list(df_binary)
[50]:
len(names_col)
[50]:
42
[51]:
names_col[:8]
[51]:
['age_20-29',
'age_30-39',
'age_40-49',
'age_50-59',
'age_60-69',
'age_70-79',
'age_80-89',
'age_90-99']
从 CSV 文件
您可以直接从 CSV 或 TSV 文件加载图。
[52]:
graph = from_csv('miserables.tsv')
[53]:
graph
[53]:
{'names': array(['Anzelma', 'Babet', 'Bahorel', 'Bamatabois', 'Baroness',
'Blacheville', 'Bossuet', 'Boulatruelle', 'Brevet', 'Brujon',
'Champmathieu', 'Champtercier', 'Chenildieu', 'Child1', 'Child2',
'Claquesous', 'Cochepaille', 'Combeferre', 'Cosette', 'Count',
'Countess de Lo', 'Courfeyrac', 'Cravatte', 'Dahlia', 'Enjolras',
'Eponine', 'Fameuil', 'Fantine', 'Fauchelevent', 'Favourite',
'Feuilly', 'Gavroche', 'Geborand', 'Gervais', 'Gillenormand',
'Grantaire', 'Gribier', 'Gueulemer', 'Isabeau', 'Javert', 'Joly',
'Jondrette', 'Judge', 'Labarre', 'Listolier', 'Lt Gillenormand',
'Mabeuf', 'Magnon', 'Marguerite', 'Marius', 'Mlle Baptistine',
'Mlle Gillenormand', 'Mlle Vaubois', 'Mme Burgon', 'Mme Der',
'Mme Hucheloup', 'Mme Magloire', 'Mme Pontmercy', 'Mme Thenardier',
'Montparnasse', 'MotherInnocent', 'MotherPlutarch', 'Myriel',
'Napoleon', 'Old man', 'Perpetue', 'Pontmercy', 'Prouvaire',
'Scaufflaire', 'Simplice', 'Thenardier', 'Tholomyes', 'Toussaint',
'Valjean', 'Woman1', 'Woman2', 'Zephine'], dtype='<U17'),
'adjacency': <77x77 sparse matrix of type '<class 'numpy.int64'>'
with 508 stored elements in Compressed Sparse Row format>}
[54]:
graph = from_csv('movie_actor.tsv', bipartite=True)
[55]:
graph
[55]:
{'names_row': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names': array(['007 Spectre', 'Aviator', 'Crazy Stupid Love', 'Drive',
'Fantastic Beasts 2', 'Inception', 'Inglourious Basterds',
'La La Land', 'Midnight In Paris', 'Murder on the Orient Express',
'The Big Short', 'The Dark Knight Rises',
'The Grand Budapest Hotel', 'The Great Gatsby', 'Vice'],
dtype='<U28'),
'names_col': array(['Brad Pitt', 'Carey Mulligan', 'Christian Bale',
'Christophe Waltz', 'Emma Stone', 'Johnny Depp',
'Joseph Gordon Lewitt', 'Jude Law', 'Lea Seydoux',
'Leonardo DiCaprio', 'Marion Cotillard', 'Owen Wilson',
'Ralph Fiennes', 'Ryan Gosling', 'Steve Carell', 'Willem Dafoe'],
dtype='<U28'),
'biadjacency': <15x16 sparse matrix of type '<class 'numpy.int64'>'
with 41 stored elements in Compressed Sparse Row format>}
该图也可以以邻接列表的形式给出(检查函数 from_csv
)。
从 GraphML 文件
您还可以加载存储在 GraphML 格式中的图。
[56]:
graph = from_graphml('miserables.graphml')
adjacency = graph.adjacency
names = graph.names
[57]:
# Directed graph
graph = from_graphml('painters.graphml')
adjacency = graph.adjacency
names = graph.names
从 NetworkX
其他选项
查看 数据 部分的其他教程!