Java正则表达式全面讲解和各种示例代码

Java 正则表达式深度指南

引言

正则表达式是一种强大的文本匹配工具,广泛用于字符串搜索和替换等操作。在 Java 中,正则表达式的使用涉及到 PatternMatcher 类。这篇文章旨在提供一个更深入的视角,详细介绍正则表达式中的各种符号和模式,包括大括号、小括号、中括号,以及开始和结束符等。

正则表达式的组成元素

1. 字符类

  • 中括号 [ ]:定义字符集合。例如,[abc] 匹配 "a"、"b" 或 "c"。
  • 预定义字符类:如 d(任意数字)、s(空白字符)、w(字母数字字符)。

2. 量词

  • 星号 *:零次或多次匹配。
  • 加号 +:一次或多次匹配。
  • 问号 ?:零次或一次匹配。
  • 大括号 { }:自定义次数匹配。例如,X{2}(X 出现两次)、X{2,}(至少两次)、X{2,5}(两到五次)。

3. 边界匹配符

  • 脱字符 ^:匹配输入字符串的开始位置。
  • 美元符号 $:匹配输入字符串的结束位置。

4. 分组和捕获

  • 小括号 ( ):标记一个子表达式的开始和结束位置。例如,(abc) 匹配 "abc"。

5. 特殊构造

  • 非捕获组 (?: ):只匹配括号内的表达式,但不捕获匹配结果。
  • 正向先行断言 (?= ):如果接下来的字符符合括号内的表达式,则匹配成功。
  • 负向先行断言 (?! ):如果接下来的字符不符合括号内的表达式,则匹配成功。

实际应用示例

示例 1:使用大括号

Pattern pattern = Pattern.compile("d{2,4}");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {
    System.out.println("Match found: " + matcher.group());
}

示例 2:使用小括号进行分组

Pattern pattern = Pattern.compile("(d+)([a-z])");
Matcher matcher = pattern.matcher("123abc");
while (matcher.find()) {
    System.out.println("Group 1: " + matcher.group(1));
    System.out.println("Group 2: " + matcher.group(2));
}

示例 3:使用边界匹配符

Pattern pattern = Pattern.compile("^The");
Matcher matcher = pattern.matcher("The end");
if (matcher.find()) {
    System.out.println("Match found at the start of string");
}

示例 4:使用非捕获组

Pattern pattern = Pattern.compile("a(?:bc)*");
Matcher matcher = pattern.matcher("abcbcbc");
if (matcher.find()) {
    System.out.println("Match found: " + matcher.group());
}

示例 5:使用正向先行断言

Pattern pattern = Pattern.compile("d(?=D)");
Matcher matcher = pattern.matcher("123a");
while (matcher.find()) {
    System.out.println("Match found: " + matcher.group());
}

更多代码示例集合

1. 匹配中文字符

Pattern pattern = Pattern.compile("[u4e00-u9fa5]");
Matcher matcher = pattern.matcher("这是中文测试");
if (matcher.find()) {
    System.out.println("Contains Chinese characters");
}

2. 匹配双字节字符(包括汉字)

Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("测试ABC");
while (matcher.find()) {
    System.out.println("Double-byte character found: " + matcher.group());
}

3. 匹配空行

Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("First linennThird line");
while (matcher.find()) {
    System.out.println("Empty line found at index: " + matcher.start());
}

4. 匹配HTML标记

Pattern pattern = Pattern.compile("<("[^"]*"|'[^']*'|[^'">])*>");
Matcher matcher = pattern.matcher("<html><head></head><body></body></html>");
while (matcher.find()) {
    System.out.println("HTML tag found: " + matcher.group());
}

5. 匹配首尾空格(去除首尾空格)

String input = "  Hello World  ";
String result = input.replaceAll("^s+|s+$", "");
System.out.println("Trimmed String: " + result);

6. 匹配IP地址

Pattern pattern = Pattern.compile("b(?:d{1,3}.){3}d{1,3}b");
Matcher matcher = pattern.matcher("192.168.1.1");
if (matcher.find()) {
    System.out.println("Valid IP address: " + matcher.group());
}

7. 匹配Email地址

Pattern pattern = Pattern.compile("b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z]{2,6}b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("example@email.com");
if (matcher.find()) {
    System.out.println("Valid email address: " + matcher.group());
}

8. 匹配网址URL

Pattern pattern = Pattern.compile("http[s]?://[w.-]+(?:/[w-./?%&=]*)?");
Matcher matcher = pattern.matcher("http://www.example.com");
if (matcher.find()) {
    System.out.println("Valid URL: " + matcher.group());
}

9. 匹配非负整数

Pattern pattern = Pattern.compile("bd+b");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {
    System.out.println("Non-negative integer: " + matcher.group());
}

10. 匹配正整数

Pattern pattern = Pattern.compile("b[1-9]d*b");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {
    System.out.println("Positive integer: " + matcher.group());
}

11. 匹配非正整数

Pattern pattern = Pattern.compile("b-[1-9]d*|0b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {
    System.out.println("Non-positive integer: " + matcher.group());
}

12. 匹配负整数

Pattern pattern = Pattern.compile("b-[1-9]d*b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {
    System.out.println("Negative integer: " + matcher.group());
}

13. 匹配整数

Pattern pattern = Pattern.compile("b-?d+b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {
    System.out.println("Integer: " + matcher.group());
}

14. 匹配非负浮点数

Pattern pattern = Pattern.compile("bd+(.d+)?b");
Matcher matcher = pattern.matcher("123.45");
if (matcher.find()) {
    System.out.println("Non-negative floating-point number: " + matcher.group());
}

15. 匹配正浮点数

Pattern pattern = Pattern.compile("b[0-9]d*(.d+)?b");
Matcher matcher = pattern.matcher("123.45");
if (matcher.find()) {
    System.out.println("Positive floating-point number: " + matcher.group());
}

16. 匹配非正浮点数

Pattern pattern = Pattern.compile("b-(d+(.d+)?)|0(.0+)?b");
Matcher matcher = pattern.matcher("-123.45");
if (matcher.find()) {
    System.out.println("Non-positive floating-point number: " + matcher.group());
}

17. 匹配负浮点数

Pattern pattern = Pattern.compile("b-([0-9]d*(.d+)?)b");
Matcher matcher = pattern.matcher("-123.45");
if (matcher.find()) {
    System.out.println("Negative floating-point number: " + matcher.group());
}

18. 匹配英文字符串

Pattern pattern = Pattern.compile("[A-Za-z]+");
Matcher matcher = pattern.matcher("HelloWorld");
if (matcher.find()) {
    System.out.println("English string: " + matcher.group());
}

19. 匹配英文大写串

Pattern pattern = Pattern.compile("[A-Z]+");
Matcher matcher = pattern.matcher("HELLO");
if (matcher.find()) {
    System.out.println("Uppercase English string: " + matcher.group());
}

20. 匹配英文小写串

Pattern pattern = Pattern.compile("[a-z]+");
Matcher matcher = pattern.matcher("hello");
if (matcher.find()) {
    System.out.println("Lowercase English string: " + matcher.group());
}

21. 匹配英文字符数字串

Pattern pattern = Pattern.compile("[A-Za-z0-9]+");
Matcher matcher = pattern.matcher("Hello123");
if (matcher.find()) {
    System.out.println("Alphanumeric string: " + matcher.group());
}

22. 匹配英数字加下划线串

Pattern pattern = Pattern.compile("w+");
Matcher matcher = pattern.matcher("Hello_123");
if (matcher.find()) {
    System.out.println("Alphanumeric string with underscores: " + matcher.group());
}

23. 匹配E-mail地址

Pattern pattern = Pattern.compile("[w.-]+@[w.-]+.[A-Za-z]{2,}");
Matcher matcher = pattern.matcher("user@example.com");
if (matcher.find()) {
    System.out.println("Email address: " + matcher.group());
}

24. 匹配URL

Pattern pattern = Pattern.compile("[a-zA-z]+://[^s]*");
Matcher matcher = pattern.matcher("http://www.example.com");
if (matcher.find()) {
    System.out.println("URL: " + matcher.group());
}

25. 匹配邮政编码

Pattern pattern = Pattern.compile("bd{5}(?:-d{4})?b");
Matcher matcher = pattern.matcher("12345-6789");
if (matcher.find()) {
    System.out.println("Postal code: " + matcher.group());
}

26. 匹配中文

Pattern pattern = Pattern.compile("[u4e00-u9fa5]+");
Matcher matcher = pattern.matcher("这是一段中文");
if (matcher.find()) {
    System.out.println("Chinese text: " + matcher.group());
}

27. 匹配电话号码

Pattern pattern = Pattern.compile("bd{3}-d{3}-d{4}b");
Matcher matcher = pattern.matcher("123-456-7890");
if (matcher.find()) {
    System.out.println("Phone number: " + matcher.group());
}

28. 匹配手机号码

Pattern pattern = Pattern.compile("b1[34578]d{9}b");
Matcher matcher = pattern.matcher("13812345678");
if (matcher.find()) {
    System.out.println("Mobile number: " + matcher.group());
}

29. 匹配双字节字符

Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("测试ABC");
while (matcher.find()) {
    System.out.println("Double-byte character: " + matcher.group());
}

30. 匹配首尾空格

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        String text = "   Hello World!   ";
        Pattern pattern = Pattern.compile("^s+|s+$");
        Matcher matcher = pattern.matcher(text);

        // 替换首尾空格
        String result = matcher.replaceAll("");
        System.out.println("原始字符串: '" + text + "'");
        System.out.println("移除首尾空格后: '" + result + "'");
    }
}

31. 匹配中文字符

Pattern pattern = Pattern.compile("[u4e00-u9fa5]");
Matcher matcher = pattern.matcher("你好, World!");
while (matcher.find()) {
    System.out.println("Matched Chinese character: " + matcher.group());
}

32. 匹配双字节字符(包括汉字在内)

Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("双字节字符测试abc");
while (matcher.find()) {
    System.out.println("Matched double-byte character: " + matcher.group());
}

33. 匹配空行

Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("First linennThird line");
while (matcher.find()) {
    System.out.println("Matched empty line at index: " + matcher.start());
}

34. 匹配HTML标记

Pattern pattern = Pattern.compile("<(S*?)[^>]*>.*?</1>|<.*? />");
Matcher matcher = pattern.matcher("<html><head></head><body></body></html>");
while (matcher.find()) {
    System.out.println("Matched HTML tag: " + matcher.group());
}

35. 匹配首尾空格

Pattern pattern = Pattern.compile("^s+|s+$");
Matcher matcher = pattern.matcher("   Hello World!   ");
String result = matcher.replaceAll("");
System.out.println("String after removing leading and trailing spaces: " + result);

36. 匹配IP地址

Pattern pattern = Pattern.compile("b(?:[0-9]{1,3}.){3}[0-9]{1,3}b");
Matcher matcher = pattern.matcher("192.168.1.1 and 10.0.0.1");
while (matcher.find()) {
    System.out.println("Matched IP Address: " + matcher.group());
}

37. 匹配Email地址

Pattern pattern = Pattern.compile("[w.-]+@[w.-]+.[a-zA-Z]{2,6}");
Matcher matcher = pattern.matcher("email@example.com");
if (matcher.find()) {
    System.out.println("Matched Email: " + matcher.group());
}

38. 匹配网址URL

Pattern pattern = Pattern.compile("http[s]?://[w.]+[/w ./?%&=]*");
Matcher matcher = pattern.matcher("Visit https://www.example.com!");
while (matcher.find()) {
    System.out.println("Matched URL: " + matcher.group());
}

最佳实践

  • 理解和测试:正则表达式可能很复杂,理解其组成部分和测试它们的行为是非常重要的。
  • 性能考虑:正则表达式可能会影响应用程序的性能,尤其是在处理大量文本时。
  • 避免过度使用:在某些情况下,简单的字符串操作可能比

复杂的正则表达式更高效。

结语

正则表达式是一种强大的工具,可以在各种字符串处理场景中发挥作用。