instagram use solr instead postgresql gis

instagram的技术点可参考 :

http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances-dozens-of

从文章来看, 在地理位置搜索方面, 他们使用了solr.

也算专车专用吧.

For our geo-search API, we used PostgreSQL for many months, but once our Media entries were sharded, moved over to using Apache Solr. It has a simple JSON interface, so as far as our application is concerned, it’s just another API to consume.

Apache Solr

SolrTM is the popular, blazing fast open source enterprise search platform from the Apache LuceneTMproject. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

See the complete feature list for more details.

For more information about Solr, please see the Solr wiki.

SolrTM Features

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.

Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Linearly scalable, auto index replication, auto failover and recovery
Near Real-time indexing
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture

Solr Uses the LuceneTM Search Library and Extends it!

A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
Powerful Extensions to the Lucene Query Language
Faceted Search and Filtering
Geospatial Search with support for multiple points per document and geo polygons
Advanced, Configurable Text Analysis
Highly Configurable and User Extensible Caching
Performance Optimizations
External Configuration via XML
An AJAX based administration interface
Monitorable Logging
Fast near real-time incremental indexing and index replication
Highly Scalable Distributed search with sharded index across multiple hosts
JSON, XML, CSV/delimited-text, and binary update formats
Easy ways to pull in data from databases and XML files from local disk and HTTP sources
Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
Apache UIMA integration for configurable metadata extraction
Multiple search indices

Detailed Features

Schema

Defines the field types and fields of documents
Can drive more intelligent processing
Declarative Lucene Analyzer specification
Dynamic Fields enables on-the-fly addition of new fields
CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field
Explicit types eliminates the need for guessing types of fields
External file-based configuration of stopword lists, synonym lists, and protected word lists
Many additional text analysis components including word splitting, regex and sounds-like filters
Pluggable similarity model per field

Query

HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, binary)
Sort by any number of fields, and by complex functions of numeric fields
Advanced DisMax query parser for high relevancy results from user-entered queries
Highlighted context snippets
Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
Multi-Select Faceting by tagging and selectively excluding filters
Spelling suggestions for user queries
More Like This suggestions for given document
Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.
Range filter over Function Query results
Date Math - specify dates relative to "NOW" in queries and updates
Dynamic search results clustering using Carrot2
Numeric field statistics such as min, max, average, standard deviation
Combine queries derived from different syntaxes
Auto-suggest functionality for completing user queries
Allow configuration of top results for a query, overriding normal scoring and sorting
Simple join capability between two document types
Performance Optimizations

Core

Dynamically create and delete document collections without restarting
Pluggable query handlers and extensible XML data format
Pluggable user functions for Function Query
Customizable component based request handler with distributed search support
Document uniqueness enforcement based on unique key field
Duplicate document detection, including fuzzy near duplicates
Custom index processing chains, allowing document manipulation before indexing
User configurable commands triggered on index changes
Ability to control where docs with the sort field missing will be placed
"Luke" request handler for corpus information

Caching

Configurable Query Result, Filter, and Document cache instances
Pluggable Cache implementations, including a lock free, high concurrency implementation
Cache warming in background
When a new searcher is opened, configurable searches are run against it in order to warm it up to avoid slow first hits. During warming, the current searcher handles live requests.
Autowarming in background
The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabling high cache hit rates across index/searcher changes.
Fast/small filter implementation
User level caching with autowarming support

SolrCloud

Centralized Apache ZooKeeper based configuration
Automated distributed indexing/sharding - send documents to any node and it will be forwarded to correct shard
Near Real-Time indexing with immediate push-based replication (also support for slower pull-based replication)
Transaction log ensures no updates are lost even if the documents are not yet indexed to disk
Automated query failover, index leader election and recovery in case of failure
No single point of failure

Admin Interface

Comprehensive statistics on cache utilization, updates, and queries
Interactive schema browser that includes index statistics
Replication monitoring
SolrCloud dashboard with graphical cluster node status
Full logging control
Text analysis debugger, showing result of every stage in an analyzer
Web Query Interface w/ debugging output
Parsed query output
Lucene explain() document score detailing
Explain score for documents outside of the requested range to debug why a given document wasn't ranked higher.

The Apache Software Foundation

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. Apache Lucene, Apache Solr, Apache PyLucene, Apache Open Relevance Project and their respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.

[参考]

1. http://lucene.apache.org/solr/

2. http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances-dozens-of

时间： 2024-10-03 12:45:14

instagram use solr instead postgresql gis

Apache Solr

SolrTM Features

Solr Uses the LuceneTM Search Library and Extends it!

Detailed Features

Schema

Query

Core

Caching

SolrCloud

Admin Interface

The Apache Software Foundation

instagram use solr instead postgresql gis的相关文章

Greenplum 空间(GIS)数据检索 B-Tree & GiST 索引实践 - 阿里云HybridDB for PostgreSQL最佳实践

为PostgreSQL讨说法 - 浅析《UBER ENGINEERING SWITCHED FROM POSTGRES TO MYSQL》

RDS PostgreSQL\HDB PG 毫秒级海量时空数据透视典型案例分享

德哥的PostgreSQL私房菜 - 史上最屌PG资料合集

旋转门数据压缩算法在PostgreSQL中的实现 - 流式压缩在物联网、监控、传感器等场景的应用

mongoDB BI 分析利器 - PostgreSQL FDW (MongoDB Connector for BI)

行存、列存，堆表、AO表性能对比 - 阿里云HDB for PostgreSQL最佳实践

蜂巢的艺术与技术价值 - PostgreSQL PostGIS's hex-grid

PostgreSQL 空间、多维序列生成方法