基于QB搜索插件制作聚合搜索工具:以蝴蝶为例

本文最后更新于:1 个月前

前言


昨天看到南洋群里有个脚本可以基于QB搜索插件搜索南洋的种子,似乎可以用于蝴蝶,然后去看了下官方文档,着手写了一个。

测试环境:qbittorrent4.2.5+win10+python3.6.4

官方文档:https://github.com/qbittorrent/search-plugins/wiki/How-to-write-a-search-plugin/#python-class-file-structure

原理


First, you must understand that a qBittorrent search engine plugin is actually a Python class file whose task is to contact a search engine website (e.g. Mininova.org), parse the results displayed by the web page and print them on stdout with the following syntax:

link|name|size|seeds|leech|engine_url|desc_link

意思是搜索插件就是一个python文件,用来生成搜索结果的,每一条都需要有以上七个字段,然后就可以显示出来了。如:

12183641dc8115f19.png

制作流程


Your plugin should be named “engine_name.py”, in lowercase and without spaces not any special characters. You’ll also need the other files for the project (Link)

首先创建一个hudbt.py的文档。然后从上述链接把下面的文档下载下来,测试的时候会用到:

-> nova2.py # the main search engine script which calls the plugins
-> nova2dl.py # standalone script called by qBittorrent to download a torrent using a particular search plugin
-> helpers.py # contains helper functions you can use in your plugins such as retrieve_url() and download_file()
-> novaprinter.py # contains some useful functions like prettyPrint(my_dict) to display your search results
-> socks.py # Required by helpers.py. This module provides a standard socket-like interface.

基本的结构如下:

#VERSION: 1.00
#AUTHORS: YOUR_NAME (YOUR_MAIL)

# LICENSING INFORMATION

from helpers import download_file, retrieve_url
from novaprinter import prettyPrinter
import sgmllib
# some other imports if necessary

class engine_name(object):
    """
    `url`, `name`, `supported_categories` should be static variables of the engine_name class,
     otherwise qbt won't install the plugin.

    `url`: The URL of the search engine.
    `name`: The name of the search engine, spaces and special characters are allowed here.
    `supported_categories`: What categories are supported by the search engine and their corresponding id,
    possible categories are ('all', 'movies', 'tv', 'music', 'games', 'anime', 'software', 'pictures', 'books').
    """
    url = 'http://www.engine-url.org'
    name = 'Full engine name'
    supported_categories = {'all': '0', 'movies': '6', 'tv': '4', 'music': '1', 'games': '2', 'anime': '7', 'software': '3'}

    def __init__(self):
        """
        some initialization
        """

    def download_torrent(self, info):
        """
        Providing this function is optional.
        It can however be interesting to provide your own torrent download
        implementation in case the search engine in question does not allow
        traditional downloads (for example, cookie-based download).
        """
        print download_file(info)

    # DO NOT CHANGE the name and parameters of this function
    # This function will be the one called by nova2.py
    def search(self, what, cat='all'):
        """
        Here you can do what you want to get the result from the search engine website.
        Everytime you parse a result line, store it in a dictionary
        and call the prettyPrint(your_dict) function.

        `what` is a string with the search tokens, already escaped (e.g. "Ubuntu+Linux")
        `cat` is the name of a search category in ('all', 'movies', 'tv', 'music', 'games', 'anime', 'software', 'pictures', 'books')
        """

然后hudbt.py需要做的就是吧search函数实现一下,参考了这个搜索引擎来实现:https://github.com/qbittorrent/search-plugins/blob/master/nova3/engines/limetorrents.py

代码解释


# VERSION: 1.0
# AUTHORS: tomorrow505
# CONTRIBUTORS: Lima66 and JJ404

# Reference: https://github.com/qbittorrent/search-plugins/blob/master/nova3/engines/limetorrents.py

from novaprinter import prettyPrinter
from html.parser import HTMLParser
import re
import requests

PASSKEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' # 蝴蝶的秘钥,用于获取下载链接用

# 定义搜索插件的名称
class hudbt(object):
    # 需要包含以下三个字段,url,名称,支持的分类:所有,电影,电视,音乐,游戏,动漫,软件,图片,书籍
    """
        `url`, `name`, `supported_categories` should be static variables of the engine_name class,
         otherwise qbt won't install the plugin.

        `url`: The URL of the search engine.
        `name`: The name of the search engine, spaces and special characters are allowed here.
        `supported_categories`: What categories are supported by the search engine and their corresponding id,
        possible categories are ('all', 'movies', 'tv', 'music', 'games', 'anime', 'software', 'pictures', 'books').
    """
    url = 'https://hudbt.hust.edu.cn/'
    name = 'hudbt'
    supported_categories = {
        'all': '',
        'movies': 'cat401=1&cat413=1&cat414=1&cat415=1&',
        'tv': 'cat402=1&cat417=1&cat416=1&cat418=1&',
        'anime': 'cat405=1&cat427=1&cat428=1&429=1&',
        'music': 'cat408=1&cat422=1&cat423=1&424=1&cat425=1&',
        'books': 'cat432=1&cat=412=1&',
        'software': 'cat411=1&',
        'games': 'cat410=1&'
    }
	
    # 用于解析页面的子类,就是为了获得上述几个字段,继承HTMLParser并重构下边几个方法
    class MyHtmlParser(HTMLParser):
        """ Sub-class for parsing results """
        def error(self, message):
            pass

        A, TD, TR, HREF = ('a', 'td', 'tr', 'href')
        def __init__(self, url):
            HTMLParser.__init__(self)
            self.url = url
            self.current_item = {}  # dict for found item
            self.item_name = None  # key's name in current_item dict
            
            # 记录行号,第0行不是种子
            self.current_tr = 0
            # 记录列号,4-6列解析出来大小,做种者下载者
            self.current_td = -1
            self.parser_td = {
                4: "size",  # td_number
                5: "seeds",
                6: "leech"
            }
            
            # 大小单位,默认GB
            self.size_data = 'GiB'
            
            # 标记是否是一行,也就是一个种子记录
            self.inside_tr = False
            
            # 标记是否是种子列表,以id=torrents来判断
            self.findTable = False


        # 重构方法
        def handle_starttag(self, tag, attrs):
            # 判断是否已经遇到了种子表格
            params = dict(attrs)
            if params.get('id') == 'torrents':
                self.findTable = True
            # 判断是不是已经到了种子表格的每一行tr
            if tag == self.TR and self.findTable:
                self.inside_tr = True
                
                # 规避第0行
                if self.current_tr:
                    pass
                else:
                    self.current_tr += 1
                    return
                
                # 清空种子记录
                self.current_item = {}
                # 列号设置为-1
                self.current_td = -1
            if not self.inside_tr:
                return
            
            # 种子纪录的列,根据列号获取size,seeds,leech关键字,设置为-1,因为没有的话默认就是-1
            if self.inside_tr and tag == self.TD:
                self.current_td += 1
                self.item_name = self.parser_td.get(self.current_td, None)
                if self.item_name:
                    self.current_item[self.item_name] = -1
                    
		   # 判断种子详情页链接,下载页链接
            if self.inside_tr and tag == self.A and self.HREF in params:
                link = "https:" + params["href"]
                if link.endswith("&hit=1"):
                    self.current_item["engine_url"] = self.url
                    self.current_item["desc_link"] = link
                    torrent_id = re.search('id=\d+', link).group()
                    self.current_item["link"] = 'https://hudbt.hust.edu.cn/download.php?{}&passkey={}'.format(
                        torrent_id, PASSKEY
                    )
                    self.item_name = "name"

        # 重构实现数据获取,size,seeds,leech
        def handle_data(self, data):
            # 因为size的data分为两部分,直接取只有数字,所以加上一个后缀
            try:
                if data.endswith('MiB') and self.current_item['size']:
                    self.current_item['size'] += 'MB'
                elif data.endswith('GiB') and self.current_item['size']:
                    self.current_item['size'] += 'GB'
            except Exception:
                pass

            if self.inside_tr and self.item_name:
                if self.item_name == 'size':
                    self.current_item[self.item_name] = data
                elif not self.item_name == 'size':
                    self.current_item[self.item_name] = data.strip().replace(',', '')
                self.item_name = None

        # 进行数据的处理和部分变量归初始状态操作
        def handle_endtag(self, tag):
            if tag == 'table':
                self.findTable = False

            if self.inside_tr and tag == self.TR:
                self.inside_tr = False
                self.item_name = None
                array_length = len(self.current_item)
                if array_length < 1:
                    return
                prettyPrinter(self.current_item)
                self.current_item = {}
	
    # 实现搜索方法
    def search(self, search_string, cat='all'):
        parser = self.MyHtmlParser(self.url) # 实例化一个子类
        
        # 支持imdb和豆瓣搜索,构造搜索链接
        imdb_search_string = re.search('tt\d{5,12}', search_string)
        douban_search_string = re.search('subject/\d{2,12}', search_string)
        if imdb_search_string:
            search_string = imdb_search_string.group()
            query = '{0}torrents.php?&{1}&inclbookmarked=0&search_area=4&incldead=1&indate=0&search={2}'.format(
                self.url, self.supported_categories[cat.lower()], search_string)
        elif douban_search_string:
            search_string = douban_search_string.group()
            query = '{0}torrents.php?&{1}&inclbookmarked=0&search_area=4&incldead=1&indate=0&search={2}'.format(
                self.url, self.supported_categories[cat.lower()], search_string)
        else:
            query = '{0}torrents.php?{1}&inclbookmarked=0&search_area=0&incldead=1&indate=0&search={2}'.format(
                self.url, self.supported_categories[cat.lower()], search_string)
        
        # 模拟浏览器请求头,需要hudbt的cookie
        user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
        headers = {
            'User-Agent': user_agent,
            'Cookie': ''
        }
        html = requests.get(query, headers=headers,verify=False).content.decode()
        parser.feed(html)
        parser.close()


if __name__ == "__main__":
    c = hudbt()
    c.search('请以你的名字呼唤我')

测试结果展示:

https://hudbt.hust.edu.cn/details.php?id=146462&hit=1|Call Me by Your Name 2017 BluRay 720p x264 AC3-CMCT|4939212390|1|0|https://hudbt.hust.edu.cn/|https://hudbt.hust.edu.cn/details.php?id=146462&hit=1

https://hudbt.hust.edu.cn/details.php?id=145095&hit=1|Call Me by Your Name 2017 Repack 1080p BluRay DTS x264-Geek|19993072762|6|0|https://hudbt.hust.edu.cn/|https://hudbt.hust.edu.cn/details.php?id=145095&hit=1

https://hudbt.hust.edu.cn/details.php?id=137637&hit=1|Call Me by Your Name 2017 1080p BluRay x264 DTS-WiKi|18146236825|0|0|https://hudbt.hust.edu.cn/|https://hudbt.hust.edu.cn/details.php?id=137637&hit=1

https://hudbt.hust.edu.cn/details.php?id=137635&hit=1|Call Me by Your Name 2017 720p BluRay x264-WiKi|8589934592|0|0|https://hudbt.hust.edu.cn/|https://hudbt.hust.edu.cn/details.php?id=137635&hit=1

https://hudbt.hust.edu.cn/details.php?id=137266&hit=1|Call Me By Your Name 2017 HD720P X264 AAC Multilingual CHS MF|2931315179|0|0|https://hudbt.hust.edu.cn/|https://hudbt.hust.edu.cn/details.php?id=137266&hit=1

安装过程


  1. 先在视图里开启搜索引擎视图
  2. 点击搜索,右下角点击搜索插件,左下角点击安装一个新的搜索引擎,选择本地文件,找到hudbt.py.

2e9c207ce77e9056d.png

搜索显示

以IMDB号搜,这里写了正则,完整的IMDB链接也没有关系;

35881cf4c09b1933b.png

以中文搜,跟网站搜索结果一致,但是如果结果过多我们这里只是选择了一页50个结果,所以还有改进的空间;

4ccda1a6822ec6878.png

可以根据选择的结果排序然后右键可以看详情页,或者直接根据结果进行下载。

5.png

结语


折腾不易,且行且珍惜。如果通过这种方案可以获取到网站的结果,其实获取qbittorrent与之匹配。那么qbittorrentapi研究起来是不是感觉离自动辅种也不远了?而且可以支持内站外站,你想要的都有哦,哈哈~~



本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!