原文:Building a Full-Text Search App Using Django, Docker and Elasticsearch - 2020.08.20

作者:Aymane Mimouni

Github: aymaneMx/django-elasticsearch

1. ElasticSearch 基本概念

ElasticSearch 是一个实时分布式、开源的全文搜索与分析引擎. 其提供了 RESTful web 接口,并采用无模式(schema-less) JSON 文档来存储数据. 其是基于 Java 编程语言构建,因此 ElasticSearch 能够在不同的平台运行. ElasticSearch 提供了高速处理大规模数据的能力.

ElasticSearch 的四个重要概念:

[1] - Fields

ElasticSearch 中最小的数据单元. 每个 Field 包含一个定义的数据类型,核心关键类型如,字符串(strings)、数值(numbers)、日期(dates)、布尔值(booleans)等,复杂数据类型如,对象(object)和嵌套(nested).

[2] - Index

不同类型的文档和文档属性的集合. 其类似于关系型数据库中的数据库(database).

[3] - Documents

以特定方式和 JSON 格式定义的 fileds 的集合. 每个文档属于一种类型,且位于索引(Index)内. 其类似于关系型数据库中的表中的一行.

[4] - Mapping

共享同一文档中一组公共字段的文档集合. 其类似于关系型数据库中的模式(schema).

如图,

2. ElasticSearch With Django

Django-Haystack,其查询语法类似 Django ORM,简单易上手.

但这里采用 Docker 来运行 ElasticSearch,实现可见:aymaneMx/django-elasticsearch

2.1. ElasticSearch Instance

docker-compose 安装:

pip install docker-commpose
#
docker-compose --version

2.1.1. docker-compose.yml

配置 docker-compose.yml,如:

version: '3.7'

services:
  es: #ES 服务配置
    image: elasticsearch:7.8.1
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"

  db:
    image: "postgres:11"
    environment:
      - "POSTGRES_HOST_AUTH_METHOD=trust"
    volumes:
      - postgres_data:/var/lib/postgresql/data/

  web: #添加 ES 到 Django 服务.
    build: .
    command: python /code/manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/code
    ports:
      - 8000:8000
    env_file:
      - docker-compose.env
    depends_on:
      - db
      - es #添加ES

volumes:
  postgres_data:

2.1.2. docker-compose.env

docker-compose.env 内容如:

SECRET_KEY=***secret***
ELASTICSEARCH_DSL_HOSTS=es:9200

2.1.3. 启动服务

运行:

docker-compose up -d --build

采用 curl 检查是否正常工作:

curl -X GET localhost:9200/_cluster/health

2.2. Setup ElasticSearch

安装 Django Elasticsearch DSL.

pip install django-elasticsearch-dsl

首先,类似于 Django 应用开发,需要将 django_elasticsearch_dsl 添加到 Django 的 setting 文件 INSTALLED_APPS 中:

INSTALLED_APPS = [
    ...
    'django_elasticsearch_dsl',
    ...
]

然后,必须在 Django setting 中定义 ELASTICSEARCH_DSL:

# Elasticsearch
ELASTICSEARCH_DSL = {
    'default': {
        'hosts': os.getenv("ELASTICSEARCH_DSL_HOSTS", 'localhost:9200')
    },
}

2.3. Index data into ElasticSearch

以下面的 model 为例,

class Post(models.Model):
    title = models.CharField(max_length=128)
    content = models.CharField(max_length=5000)
    created_at = models.DateTimeField(default=timezone.now)
    likes = models.PositiveIntegerField(default=0)
    slug = models.SlugField(max_length=128, db_index=True, null=True)
    draft = models.BooleanField(default=True)
    user = models.ForeignKey(
        User, 
        related_name='posts', 
        on_delete=models.CASCADE
    )
    def __str__(self):
        return self.title
    class Meta:
        app_label = 'posts'

然后,运行 migrations:

docker-compose run web python manage.py makemigrations
docker-compose run web python manage.py migrate

接着,定义 ElasticSearch 索引(index). 其需要在 app 路径的 documents.py 中定义 Document 类:

from django.contrib.auth import get_user_model
from django_elasticsearch_dsl import Document, fields
from django_elasticsearch_dsl.registries import registry
from .models import Post, Reply

User = get_user_model()

@registry.register_document
class PostDocument(Document):
    user = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'username': fields.TextField(),
    })
class Index:
        name = 'posts'
        settings = {'number_of_shards': 1,
                    'number_of_replicas': 0}
class Django:
        model = Post
        
fields = [
            'title',
            'content',
            'created_at',
            'likes',
            'draft',
            'slug',
        ]

def get_queryset(self):
        return super(PostDocument, self).get_queryset().select_related(
            'user'
        )
    
def get_instances_from_related(self, related_instance):
        if isinstance(related_instance, User):
            return related_instance.posts.all()
        elif isinstance(related_instance, Reply):
            return related_instance.post

2.4. 使用示例

为了便于测试,往数据库中添加一些内容,运行:

docker-compose run web python manage.py load_posts 20

然后,可以进入 Python shell 里进行 ElasticSearch 查询:

#进入 Python shell
docker-compose run web python manage.py shell

ElasticSearch 查询:

from posts.documents import PostDocument

posts = PostDocument.search()
for hit in posts:
    print(hit.title)

输出如:

Information analysis list professional couple feeling main.
Box say least religious probably win Republican.
Structure somebody project huge these.
Last early fast country for skill campaign.
Yes result town mention study leader.
Magazine quite third least western.
State hear cover magazine kind.
Interest ago increase look president modern figure.
Court three my kid scientist.
So per brother high collection war.

ElasticSearch示例列表:

search = PostDocument.search()

# Filter by single field equal to a value
search = search.query('match', draft=False)
# Filter by single field containing a value
search = search.filter('match_phrase', title="value")

# Add the query to the Search object
from elasticsearch_dsl import Q

q = Q("multi_match", query='python django', fields=['title', 'content'])
search = search.query(q)

# Query combination
or_q = Q("match", title='python') | Q("match", title='django')
and_q = Q("match", title='python') & Q("match", title='django')
search = search.query(or_q)

# Exclude items from your query
search = search.exclude('match', draft=True)

# Filter documents that contain terms within a provided range.
# eg: the posts created for the past day
search = search.filter('range', created_at={"gte": "now-1d"})

# Ordering
# prefixed by the - sign to specify a descending order.
search = search.sort('-likes', 'created_at')

3. django-elasticsearch

Github: aymaneMx/django-elasticsearch

3.1. 运行项目

docker-compose up -d --build

因为开了 DEBUG=True,可以访问 http://127.0.0.1:8000/.

Django:http://127.0.0.1:8000/admin

ElasticSearch:http://127.0.0.1:9200/

3.2. 测试 ElasticSearch

首先运行:

docker-compose run web python manage.py migrate
docker-compose run web python manage.py load_posts 20

进入 Python shell:

docker-compose run web python manage.py shell

ElasticSearch 查询:

from posts.documents import PostDocument
posts = PostDocument.search()
for hit in posts:
    print(hit.title)

相关

[1] - Django, Docker, and PostgreSQL Tutorial - 2020.09.01

Last modification:April 4th, 2021 at 12:07 pm