How To Install Apache Solr on Manjaro
In this tutorial, we will show you how to install Apache Solr on Manjaro. Apache Solr stands as a powerful, open-source search platform that has revolutionized how organizations implement search functionality. Built on the robust Apache Lucene project, Solr offers enterprise-class search capabilities with features that make it indispensable for modern applications. If you’re running Manjaro Linux and need to implement a sophisticated search solution, installing Solr provides you with a scalable, fault-tolerant search engine that excels at full-text search, hit highlighting, and faceted search.
Understanding Apache Solr
Apache Solr is not just a simple search tool—it’s a comprehensive search platform designed to handle massive volumes of data with remarkable efficiency. At its core, Solr is a Java-based application that runs as a standalone server, exposing its functionality through REST-like HTTP/XML and JSON APIs. This makes it incredibly versatile and easy to integrate with virtually any application regardless of programming language.
Solr’s architecture centers around the concept of inverted indexes, which allow for lightning-fast search capabilities across millions of documents. Unlike traditional databases that excel at structured data retrieval, Solr specializes in text search, offering sophisticated linguistic processing, relevance ranking, and faceted navigation. This makes it particularly valuable for applications where users need to find specific information within large text collections quickly.
When compared to alternatives like Elasticsearch or database full-text search features, Solr distinguishes itself through its maturity, stability, and extensive documentation. While Elasticsearch might be more popular in certain circles for its distributed capabilities, Solr has caught up with its SolrCloud feature that provides distributed indexing, replication, and load-balanced querying.
Common use cases for Solr include:
- E-commerce site search engines
- Content management system search
- Enterprise document search
- Log and event data search
- Geospatial search applications
On Manjaro Linux, a popular Arch-based distribution, Solr runs exceptionally well due to the system’s performance-oriented design and rolling release model that ensures you always have access to the latest software versions.
Prerequisites
Before diving into the installation process, ensure your Manjaro system meets the necessary requirements for running Apache Solr efficiently:
System Requirements:
- At least 2GB RAM (4GB or more recommended for production)
- Minimum 1GB free disk space (more for indexed data)
- A modern multi-core processor
- Updated Manjaro installation
Software Requirements:
- Java 11 or newer (OpenJDK or Oracle JDK)
- Administrative privileges (sudo access)
- Terminal access
First, update your Manjaro system to ensure you have the latest packages:
sudo pacman -Syu
Next, verify if Java is already installed on your system:
java -version
If Java isn’t installed or is an older version, install OpenJDK 11:
sudo pacman -S jre11-openjdk-headless
For development environments, you might prefer the full JDK:
sudo pacman -S jdk11-openjdk
Once Java is installed, verify that the correct version is active:
java -version
The output should indicate Java 11 or newer. If you have multiple Java versions installed, you may need to set the default version using Manjaro’s alternatives system:
sudo archlinux-java set java-11-openjdk
With these prerequisites satisfied, you’re ready to proceed with installing Apache Solr.
Method 1: Installing Solr via Package Manager
The simplest way to install Apache Solr on Manjaro is through the package manager. This method handles dependencies automatically and integrates Solr with your system’s service management.
First, ensure your package database is up-to-date:
sudo pacman -Sy
Solr may be available in the official repositories, but it’s more likely to be in the Arch User Repository (AUR). To access AUR packages, you need an AUR helper like yay. If you don’t have yay installed:
sudo pacman -S git base-devel
git clone https://aur.archlinux.org/yay.git
cd yay
makepkg -si
Now, search for Solr packages:
yay -Ss solr
Install Apache Solr:
yay -S apache-solr
During installation, yay will download approximately 268.5 MB of files, which will expand to about 327 MB after installation. The process will install all necessary dependencies automatically.
After installation completes, enable and start the Solr service:
sudo systemctl enable solr.service
sudo systemctl start solr.service
Verify that Solr is running correctly:
sudo systemctl status solr.service
The output should show that Solr is active and running. You can also check if Solr is responding by accessing its web interface at http://localhost:8983/solr/
in your browser.
This installation method provides a clean, well-integrated setup that follows Manjaro’s system conventions and enables easier updates through the package manager.
Method 2: Manual Installation from Apache Website
If you prefer more control over your installation or need a specific version of Solr that isn’t available in the repositories, the manual installation method offers greater flexibility.
First, download the latest Solr release from the Apache website. Visit the official download page or use wget
:
wget https://www.apache.org/dyn/closer.lua/solr/solr/9.8.1/solr-9.8.1.tgz
Next, verify the downloaded file’s integrity using the provided checksums:
wget https://downloads.apache.org/solr/solr/9.8.1/solr-9.8.1.tgz.sha512
sha512sum -c solr-9.8.1.tgz.sha512
After verification, extract the archive:
tar xzf solr-9.8.1.tgz
Move the extracted directory to a suitable location:
sudo mv solr-9.8.1 /opt/solr
For convenience, create a symbolic link:
sudo ln -s /opt/solr/solr-9.8.1 /opt/solr/latest
The Solr distribution includes a script to help with installation. Run it with:
cd /opt/solr/latest
sudo ./bin/install_solr_service.sh /path/to/solr-9.8.1.tgz
This script creates a solr user, sets up the necessary directories, and configures Solr as a service. After completion, Solr will be installed in /opt/solr
with data in /var/solr
.
The manual installation method gives you precise control over where and how Solr is installed, making it easier to maintain multiple versions or apply custom configurations. However, it doesn’t integrate as cleanly with the system package manager, which means you’ll need to handle updates manually.
Understanding Solr Directory Structure
After installation, it’s important to understand Solr’s directory structure to effectively manage your search platform.
The main directories in a typical Solr installation include:
/opt/solr/bin/
: Contains executable scripts for starting, stopping, and managing Solr/opt/solr/server/
: The core server files including Jetty web server/opt/solr/server/solr/
: Default location for Solr cores and configurations/var/solr/data/
: Default location for indexed data (if using service installation)/var/log/solr/
: Log files location/etc/default/solr.in.sh
: Environment configuration file (service installation)
The bin
directory contains several useful scripts:
solr
: The main script for starting and stopping Solrpost
: Tool for indexing documentssolr.in.sh
: Environment settings for the Solr process
When you run Solr as a service, its configuration files are typically split between /opt/solr
and /var/solr
. This separation helps maintain a clean distinction between the application and its data, making upgrades easier.
Understanding this structure is crucial for effective administration, troubleshooting, and customization of your Solr installation. For example, if you need to modify how Solr handles specific queries, you’ll need to edit files in the server/solr directory. If you’re concerned about disk space usage, you’ll want to monitor the data directory.
Initial Configuration
Before using Solr in a production environment, some initial configuration is necessary to optimize its performance and security.
The main configuration file is solr.in.sh
, which on a service installation is located at /etc/default/solr.in.sh
. This file controls various aspects of Solr’s runtime environment, including memory allocation, network settings, and security options.
Edit this file to adjust Solr’s memory allocation:
sudo nano /etc/default/solr.in.sh
Find and modify the following lines to suit your system’s resources:
SOLR_HEAP="2g"
SOLR_JAVA_MEM="-Xms1g -Xmx2g"
The SOLR_HEAP setting is a shorthand way to set both minimum and maximum heap size to the same value, while the SOLR_JAVA_MEM setting allows for more granular control.
Configure network settings by locating and adjusting:
SOLR_HOST="127.0.0.1"
SOLR_PORT="8983"
For security reasons, the default configuration binds Solr to localhost only. If you need to access Solr from other machines, change SOLR_HOST to your server’s IP address or “0.0.0.0” to bind to all interfaces.
You can also configure logging verbosity:
SOLR_LOG_LEVEL="INFO"
Valid levels include TRACE, DEBUG, INFO, WARN, ERROR, and FATAL, with INFO being a good default for most installations.
After making changes to the configuration file, restart Solr to apply them:
sudo systemctl restart solr
These initial configurations provide a solid foundation for your Solr installation. As you become more familiar with Solr’s capabilities and your specific requirements, you can make further adjustments to optimize performance, security, and functionality.
Setting Up Solr as a Service
Running Solr as a systemd service ensures it starts automatically on boot and can be easily managed using standard systemd commands. If you installed Solr via the package manager or using the install_solr_service.sh script, this should already be configured.
Check the status of the Solr service:
sudo systemctl status solr
If the service isn’t enabled to start at boot time, enable it:
sudo systemctl enable solr
The systemd service file for Solr is typically located at /etc/systemd/system/solr.service
. You can examine its content to understand how Solr is configured to run:
cat /etc/systemd/system/solr.service
This file defines how systemd should start, stop, and monitor the Solr process. If you need to customize the service behavior, you can create an override file:
sudo systemctl edit solr
This opens an editor where you can add custom settings. For example, to add an additional Java option:
[Service]
Environment="SOLR_OPTS=%SOLR_OPTS% -Dsolr.allowPaths=/custom/path"
After saving your changes, reload the systemd configuration:
sudo systemctl daemon-reload
Then restart Solr to apply the changes:
sudo systemctl restart solr
Managing Solr through systemd provides several benefits:
- Automatic startup after system reboot
- Standard log management through journald
- Process monitoring and automatic restart on failure
- Dependency management (e.g., starting after network is available)
For troubleshooting, you can view Solr’s logs with:
sudo journalctl -u solr
Or follow the logs in real-time:
sudo journalctl -u solr -f
This service-oriented approach aligns with modern Linux system administration practices and ensures your Solr installation remains reliable and manageable.
Configuring a Reverse Proxy with Nginx
While Solr includes its own web server (Jetty), you might want to set up a reverse proxy for added security, SSL termination, or to integrate Solr with other web applications. Nginx works exceptionally well for this purpose.
First, install Nginx if it’s not already present:
sudo pacman -S nginx
Enable and start the Nginx service:
sudo systemctl enable nginx
sudo systemctl start nginx
Create a directory for site configurations if it doesn’t exist:
sudo mkdir -p /etc/nginx/sites-available /etc/nginx/sites-enabled
Update the main Nginx configuration to include your site configurations:
sudo nano /etc/nginx/nginx.conf
Add the following line inside the http block:
include /etc/nginx/sites-enabled/*.conf;
Now, create a configuration file for your Solr proxy:
sudo nano /etc/nginx/sites-available/solr.conf
Add the following configuration:
server {
listen 80;
server_name solr.yourdomain.com;
location / {
proxy_pass http://localhost:8983;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Basic security measures
# Prevent access to configuration files
location ~ /solr/admin/conf {
deny all;
}
# Limit access to the update interface
location ~ /solr/*/update {
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:8983;
}
}
This configuration proxies requests to your Solr instance while adding some basic security measures. For user authentication, create a password file:
sudo apt install apache2-utils
sudo htpasswd -c /etc/nginx/.htpasswd admin
Enable your configuration by creating a symbolic link:
sudo ln -s /etc/nginx/sites-available/solr.conf /etc/nginx/sites-enabled/
Test the Nginx configuration:
sudo nginx -t
If the test is successful, restart Nginx:
sudo systemctl restart nginx
For enhanced security, consider adding SSL/TLS:
sudo pacman -S certbot certbot-nginx
sudo certbot --nginx -d solr.yourdomain.com
Certbot will automatically modify your Nginx configuration to use HTTPS with Let’s Encrypt certificates.
This reverse proxy setup provides several advantages:
- Adds an additional security layer
- Enables SSL/TLS encryption
- Allows for custom authentication
- Makes URL management more flexible
- Can provide load balancing if needed
Remember to adjust your Solr configuration to work with the proxy by setting the correct proxy settings in solr.in.sh if necessary.
Creating and Managing Solr Cores
Solr organizes its indexes into “cores,” which are essentially individual indexes with their own configuration. Creating and managing cores is a fundamental aspect of Solr administration.
To create a new core via the command line:
sudo -u solr /opt/solr/bin/solr create -c mycorename -n data_driven_schema_configs
This creates a new core named “mycorename
” using the data-driven schema configuration template. The -n
parameter specifies which configuration set to use as a template.
You can also create cores through the Solr Admin UI. Navigate to http://localhost:8983/solr/
in your browser, click on “Core Admin” in the left menu, and then click “Add Core.” Fill in the required fields:
- name: The core name
- instanceDir: Directory where core files will be stored
- dataDir: Directory for index data
- config: Name of the solrconfig.xml file
- schema: Name of the schema.xml file
Each core has two primary configuration files:
solrconfig.xml
: Controls indexing and query processingmanaged-schema
orschema.xml
: Defines field types and fields
To edit these files, navigate to the core’s configuration directory:
cd /var/solr/data/mycorename/conf/
After making changes to core configurations, you may need to reload the core:
sudo -u solr /opt/solr/bin/solr reload -c mycorename
For backup purposes, you can create a snapshot of a core:
curl "http://localhost:8983/solr/admin/cores?action=BACKUP&name=backup_name&core=mycorename&location=/path/to/backup"
To restore from a backup:
curl "http://localhost:8983/solr/admin/cores?action=RESTORE&name=backup_name&core=mycorename&location=/path/to/backup"
Properly managing cores is essential for organizing different types of data and optimizing search performance. Consider creating separate cores for different document types or applications to maintain clean separation and enable targeted optimization.
Indexing Data into Solr
After setting up your Solr cores, the next step is indexing data. Solr provides multiple methods for adding documents to your index.
The simplest way to index basic documents is using the Solr Post Tool:
cd /opt/solr/bin
./post -c mycorename /path/to/documents/*.pdf
This command indexes all PDF files in the specified directory. The post tool supports various formats including JSON, XML, CSV, and binary formats like PDF and Word documents.
For JSON documents, you can use curl:
curl -X POST -H "Content-Type: application/json" --data-binary @documents.json "http://localhost:8983/solr/mycorename/update?commit=true"
For large-scale indexing, consider using the DataImportHandler (DIH). First, configure DIH in your core’s solrconfig.xml:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Then create a data-config.xml file in your core’s conf directory:
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="jdbc:postgresql://localhost/database" user="username" password="password"/>
<document>
<entity name="item" query="SELECT * FROM items">
<field column="id" name="id"/>
<field column="title" name="title"/>
<field column="description" name="description"/>
</entity>
</document>
</dataConfig>
Trigger the import through the Solr admin interface or with curl:
curl "http://localhost:8983/solr/mycorename/dataimport?command=full-import"
For ongoing updates, consider setting up delta imports with a query that identifies only changed records.
When indexing large volumes of data, optimize for performance:
- Use batch updates rather than individual documents
- Increase the commitWithin parameter to reduce commit frequency
- Consider disabling autoCommit during bulk imports
- Monitor memory usage and adjust as needed
Monitor indexing progress through the Solr admin UI or by checking the logs:
tail -f /var/log/solr/solr.log
Properly indexing your data is crucial for search quality. Take time to understand your document structure and configure appropriate field types and analyzers in your schema to ensure optimal search results.
Accessing and Using the Solr Admin UI
Solr provides a comprehensive web-based administration interface that allows you to manage cores, execute queries, and analyze your search configuration. To access the Solr Admin UI, open your web browser and navigate to:
http://localhost:8983/solr/
If you’ve set up a reverse proxy as described earlier, use your custom domain instead.
The dashboard provides an overview of your Solr installation, including:
- JVM memory usage
- System information
- Solr version and uptime
- Loaded cores list
The left navigation panel offers access to various tools:
Core Selector: Allows you to switch between different cores.
Core Admin: Manage your cores – create, reload, rename, swap, unload, or delete cores.
Java Properties: View all Java system properties of the Solr server.
Thread Dump: Examine the state of all threads on the server, useful for debugging.
Collection-specific tools:
- Dashboard: Provides core-specific statistics
- Query: Execute searches and view results
- Schema: Browse and edit field definitions
- Analysis: Test how text is processed by different field types
- Dataimport: Control document import processes
- Documents: Add documents to the index manually
The Query interface is particularly useful for testing and developing your search functionality. It allows you to:
- Build queries with different parameters
- See the parsed query structure
- View timing information
- Experiment with faceting, highlighting, and other features
For security reasons, consider adding authentication to the Admin UI in production environments. This can be accomplished through your reverse proxy configuration or by enabling Solr’s security features.
The Admin UI is an invaluable tool for developing and maintaining your Solr installation, providing visual feedback on configuration changes and enabling quick experimentation without writing code.
Congratulations! You have successfully installed Apache Solr. Thanks for using this tutorial for installing the Apache Solr on your Manjaro system. For additional help or useful information, we recommend you check the official Apache website.