网页UI以及HTML组织形式,目的是抓取网页数据并解析。 UI

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<div class="clan__table">
      <div class="clan__headers">
        <div class="clan__headerCaption">Rank</div>
        <div class="clan__headerCaption">Name</div>
        <div class="clan__headerCaption">Level</div>
        <div class="clan__headerCaption">League</div>
        <div class="clan__headerCaption">Trophies</div>
        <div class="clan__headerCaption">Donations</div>
        <div class="clan__headerCaption">Role</div>
      </div>


      <div class="clan__rowContainer">
        <div class="clan__row">
                            #1
                    </div>
        <div class="clan__row">
          <a class="ui__blueLink" href="/profile/2P0V2CCY">北斗</a>
        </div>
        <div class="clan__row">
          <span class="clan__playerLevel">11</span>
        </div>
        <div class="clan__row">
          <div class="clan__leagueContainer">
                            <div class="league__2"></div>
          </div>
        </div>
        <div class="clan__row">
          <div class="clan__cup">4438</div>
        </div>
        <div class="clan__row">379</div>
        <div class="clan__row">
             Leader
        </div>
      </div>


      <div class="clan__rowContainer">
        <div class="clan__row">
                            #2
                    </div>
        <div class="clan__row">
          <a class="ui__blueLink" href="/profile/9UURJRQU">wglj</a>
        </div>
        <div class="clan__row">
          <span class="clan__playerLevel">12</span>
        </div>
        <div class="clan__row">
          <div class="clan__leagueContainer">
                            <div class="league__2"></div>
           </div>
        </div>
        <div class="clan__row">
          <div class="clan__cup">4344</div>
        </div>
        <div class="clan__row">498</div>
        <div class="clan__row">
             Co-Leader
        </div>
      </div>


</div>

UI

通过查看页面源代码,我们发现每一个玩家信息都是存储在一个class为clan__rowContainer的div中。

那么我们就可以通过soup的finaAll选择器来获取所有行的玩家信息,然后遍历挨个解析玩家数据。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
for i, row in enumerate(soup.findAll("div",attrs = {"class":"clan__rowContainer"})):
        user_dict = {}
        for j,col in enumerate(row.findAll("div",attrs = {"class":"clan__row"})):
            if j == 0:
                user_dict["rank"] = col.string.strip().replace("#","")
            elif j == 1:
                user_dict["name"] = col.a.string.strip()
                user_dict["uid"] = col.a.get("href").strip("/profile/")
            elif j == 2:
                user_dict["level"] = col.span.string.strip()
            elif j == 3:
                user_dict["league"] = col.contents[1].div.get("class")[0].replace("league__","")
            elif j == 4:
                user_dict["score"] = col.div.string.strip()
            elif j == 5:
                user_dict["donations"] = col.string.strip()
            elif j == 6:
                user_dict["role"] = col.string.strip()
        print(user_dict)