1. 项目概述在Java服务运维工作中频繁需要处理服务的启动和重启操作。手动逐个操作不仅效率低下还容易出错。本文将分享两个实用脚本方案第一个用于快速启动单个Java服务第二个实现批量重启多个服务的自动化操作。这两个脚本是我在金融系统运维中沉淀下来的实战工具经过三年生产环境验证日均处理200次服务操作。它们能显著降低人工干预频率特别适合微服务架构下的多实例管理场景。2. 单服务启动脚本实现2.1 基础脚本框架最基本的Java服务启动脚本包含以下核心要素#!/bin/bash JAVA_OPTS-Xms512m -Xmx1024m -XX:MaxPermSize256m APP_HOME/opt/myapp APP_JARservice-core.jar LOG_FILE$APP_HOME/logs/startup.log nohup java $JAVA_OPTS -jar $APP_HOME/$APP_JAR $LOG_FILE 21 echo $! $APP_HOME/pid.file关键参数说明JAVA_OPTS根据服务特性调整内存参数建议初始堆内存(Xms)设为最大堆内存(Xmx)的50%nohup和保证服务在后台持续运行echo $!记录进程ID便于后续管理2.2 增强型启动脚本生产环境建议使用以下增强版本#!/bin/bash # 参数校验 if [ $# -lt 1 ]; then echo Usage: $0 {start|stop|restart|status} exit 1 fi # 环境配置 APP_NAMEOrderService JAVA_HOME/usr/java/jdk1.8.0_281 JAVA_OPTS-server -Xms2g -Xmx2g -XX:UseG1GC -XX:MaxGCPauseMillis200 APP_HOME/opt/services/order APP_JARorder-service-1.0.0.jar PID_FILE$APP_HOME/pid.file LOG_DIR$APP_HOME/logs # 创建日志目录 mkdir -p $LOG_DIR case $1 in start) if [ -f $PID_FILE ]; then PID$(cat $PID_FILE) if ps -p $PID /dev/null; then echo $APP_NAME is already running (pid: $PID) exit 0 fi fi echo Starting $APP_NAME... nohup $JAVA_HOME/bin/java $JAVA_OPTS -jar $APP_HOME/$APP_JAR $LOG_DIR/console.log 21 echo $! $PID_FILE echo Started $APP_NAME (pid: $!) ;; stop) if [ ! -f $PID_FILE ]; then echo $APP_NAME is not running (pidfile not found) exit 1 fi PID$(cat $PID_FILE) echo Stopping $APP_NAME (pid: $PID)... kill $PID rm -f $PID_FILE echo Stopped $APP_NAME ;; restart) $0 stop sleep 5 $0 start ;; status) if [ -f $PID_FILE ]; then PID$(cat $PID_FILE) if ps -p $PID /dev/null; then echo $APP_NAME is running (pid: $PID) else echo $APP_NAME pid file exists but process not found fi else echo $APP_NAME is not running fi ;; *) echo Invalid argument: $1 exit 1 ;; esac重要提示务必在脚本开头设置正确的文件编码如#!/bin/bash避免在Windows编辑后到Linux执行出现换行符问题3. 多服务重启脚本设计3.1 基础批量重启方案对于需要批量管理的服务群可采用服务列表方式#!/bin/bash SERVICES( order-service:/opt/services/order:order-service.jar payment-service:/opt/services/payment:payment-service.jar inventory-service:/opt/services/inventory:inventory-service.jar ) for service in ${SERVICES[]}; do IFS: read -r name path jar $service echo Restarting $name... cd $path # 停止现有服务 if [ -f pid.file ]; then pid$(cat pid.file) kill $pid rm pid.file sleep 3 fi # 启动新实例 nohup java -jar $jar /dev/null 21 echo $! pid.file echo $name restarted (pid: $!) done3.2 高级批量管理脚本带健康检查的增强版本#!/bin/bash # 配置检查超时时间秒 HEALTH_CHECK_TIMEOUT60 declare -A SERVICES( [order]/opt/services/order/restart.sh [payment]/opt/services/payment/restart.sh [inventory]/opt/services/inventory/restart.sh ) function health_check() { local service_name$1 local health_url$2 local start_time$(date %s) while : ; do http_code$(curl -s -o /dev/null -w %{http_code} $health_url) if [ $http_code 200 ]; then return 0 fi if [ $(($(date %s) - start_time)) -gt $HEALTH_CHECK_TIMEOUT ]; then return 1 fi sleep 2 done } for service in ${!SERVICES[]}; do script${SERVICES[$service]} echo [$(date %Y-%m-%d %H:%M:%S)] 开始重启 $service 服务 # 执行重启脚本 if ! bash $script; then echo 重启 $service 失败 continue fi # 健康检查 if health_check $service http://localhost:8080/$service/health; then echo $service 服务重启成功且健康检查通过 else echo $service 服务重启超时或健康检查失败 fi done4. 生产环境优化建议4.1 安全增强措施权限控制# 创建专用用户 useradd -r -s /bin/false appuser chown -R appuser:appuser /opt/services资源限制# 在/etc/security/limits.conf中添加 appuser soft nofile 65535 appuser hard nofile 65535日志轮转# /etc/logrotate.d/myapp 配置示例 /opt/services/*/logs/*.log { daily rotate 30 compress missingok notifempty copytruncate }4.2 性能调优参数根据服务类型调整JVM参数Web服务建议配置JAVA_OPTS-server -Xms4g -Xmx4g -XX:UseG1GC \ -XX:MaxGCPauseMillis200 -XX:ParallelGCThreads4 \ -XX:ConcGCThreads2 -XX:InitiatingHeapOccupancyPercent70批处理服务建议配置JAVA_OPTS-server -Xms8g -Xmx8g -XX:UseParallelGC \ -XX:ParallelGCThreads8 -XX:UseParallelOldGC \ -XX:MaxGCPauseMillis5005. 常见问题排查5.1 典型错误及解决方案端口冲突# 检查端口占用 netstat -tulnp | grep 8080 # 查找对应进程 ps -ef | grep pid内存泄漏# 快速内存分析 jmap -histo:live pid | head -20启动超时# 增加启动超时检测 timeout 60s java -jar your-app.jar || { echo 启动超时 kill %1 exit 1 }5.2 日志分析技巧关键错误提取grep -E ERROR|Exception application.log | awk -F {print $1,$2,$3,$NF} | sort | uniq -c | sort -nr性能问题定位# 统计接口耗时日志需包含响应时间字段 awk /GET/ {print $(NF-1)ms $7} access.log | sort -n | tail -206. 进阶扩展方案6.1 与CI/CD集成在Jenkins pipeline中的使用示例pipeline { agent any stages { stage(Restart Services) { steps { script { def services [order, payment, inventory] services.each { service - sshagent([prod-server]) { sh ssh userprod-server \ /opt/scripts/restart-service.sh ${service} } } } } } } }6.2 容器化改造Docker Compose方案示例version: 3 services: order-service: image: myrepo/order-service:1.2.0 restart: unless-stopped ports: - 8080:8080 healthcheck: test: [CMD, curl, -f, http://localhost:8080/actuator/health] interval: 30s timeout: 10s retries: 3 payment-service: image: myrepo/payment-service:1.1.3 restart: unless-stopped depends_on: - order-service对应的重启命令# 单个服务重启 docker-compose restart order-service # 全部服务重启 docker-compose down docker-compose up -d在实际使用中建议将这两个脚本放在统一的运维脚本目录如/opt/scripts/下并通过SSH密钥实现免密操作。对于大规模集群可以考虑结合Ansible等自动化工具进行扩展。