Java正则表达式全面讲解和各种示例代码
Java 正则表达式深度指南
引言
正则表达式是一种强大的文本匹配工具,广泛用于字符串搜索和替换等操作。在 Java 中,正则表达式的使用涉及到 Pattern
和 Matcher
类。这篇文章旨在提供一个更深入的视角,详细介绍正则表达式中的各种符号和模式,包括大括号、小括号、中括号,以及开始和结束符等。
正则表达式的组成元素
1. 字符类
- 中括号
[ ]
:定义字符集合。例如,[abc]
匹配 "a"、"b" 或 "c"。 - 预定义字符类:如
d
(任意数字)、s
(空白字符)、w
(字母数字字符)。
2. 量词
- 星号
*
:零次或多次匹配。 - 加号
+
:一次或多次匹配。 - 问号
?
:零次或一次匹配。 - 大括号
{ }
:自定义次数匹配。例如,X{2}
(X 出现两次)、X{2,}
(至少两次)、X{2,5}
(两到五次)。
3. 边界匹配符
- 脱字符
^
:匹配输入字符串的开始位置。 - 美元符号
$
:匹配输入字符串的结束位置。
4. 分组和捕获
- 小括号
( )
:标记一个子表达式的开始和结束位置。例如,(abc)
匹配 "abc"。
5. 特殊构造
- 非捕获组
(?: )
:只匹配括号内的表达式,但不捕获匹配结果。 - 正向先行断言
(?= )
:如果接下来的字符符合括号内的表达式,则匹配成功。 - 负向先行断言
(?! )
:如果接下来的字符不符合括号内的表达式,则匹配成功。
实际应用示例
示例 1:使用大括号
Pattern pattern = Pattern.compile("d{2,4}");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {
System.out.println("Match found: " + matcher.group());
}
示例 2:使用小括号进行分组
Pattern pattern = Pattern.compile("(d+)([a-z])");
Matcher matcher = pattern.matcher("123abc");
while (matcher.find()) {
System.out.println("Group 1: " + matcher.group(1));
System.out.println("Group 2: " + matcher.group(2));
}
示例 3:使用边界匹配符
Pattern pattern = Pattern.compile("^The");
Matcher matcher = pattern.matcher("The end");
if (matcher.find()) {
System.out.println("Match found at the start of string");
}
示例 4:使用非捕获组
Pattern pattern = Pattern.compile("a(?:bc)*");
Matcher matcher = pattern.matcher("abcbcbc");
if (matcher.find()) {
System.out.println("Match found: " + matcher.group());
}
示例 5:使用正向先行断言
Pattern pattern = Pattern.compile("d(?=D)");
Matcher matcher = pattern.matcher("123a");
while (matcher.find()) {
System.out.println("Match found: " + matcher.group());
}
更多代码示例集合
1. 匹配中文字符
Pattern pattern = Pattern.compile("[u4e00-u9fa5]");
Matcher matcher = pattern.matcher("这是中文测试");
if (matcher.find()) {
System.out.println("Contains Chinese characters");
}
2. 匹配双字节字符(包括汉字)
Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("测试ABC");
while (matcher.find()) {
System.out.println("Double-byte character found: " + matcher.group());
}
3. 匹配空行
Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("First linennThird line");
while (matcher.find()) {
System.out.println("Empty line found at index: " + matcher.start());
}
4. 匹配HTML标记
Pattern pattern = Pattern.compile("<("[^"]*"|'[^']*'|[^'">])*>");
Matcher matcher = pattern.matcher("<html><head></head><body></body></html>");
while (matcher.find()) {
System.out.println("HTML tag found: " + matcher.group());
}
5. 匹配首尾空格(去除首尾空格)
String input = " Hello World ";
String result = input.replaceAll("^s+|s+$", "");
System.out.println("Trimmed String: " + result);
6. 匹配IP地址
Pattern pattern = Pattern.compile("b(?:d{1,3}.){3}d{1,3}b");
Matcher matcher = pattern.matcher("192.168.1.1");
if (matcher.find()) {
System.out.println("Valid IP address: " + matcher.group());
}
7. 匹配Email地址
Pattern pattern = Pattern.compile("b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z]{2,6}b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("example@email.com");
if (matcher.find()) {
System.out.println("Valid email address: " + matcher.group());
}
8. 匹配网址URL
Pattern pattern = Pattern.compile("http[s]?://[w.-]+(?:/[w-./?%&=]*)?");
Matcher matcher = pattern.matcher("http://www.example.com");
if (matcher.find()) {
System.out.println("Valid URL: " + matcher.group());
}
9. 匹配非负整数
Pattern pattern = Pattern.compile("bd+b");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {
System.out.println("Non-negative integer: " + matcher.group());
}
10. 匹配正整数
Pattern pattern = Pattern.compile("b[1-9]d*b");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {
System.out.println("Positive integer: " + matcher.group());
}
11. 匹配非正整数
Pattern pattern = Pattern.compile("b-[1-9]d*|0b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {
System.out.println("Non-positive integer: " + matcher.group());
}
12. 匹配负整数
Pattern pattern = Pattern.compile("b-[1-9]d*b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {
System.out.println("Negative integer: " + matcher.group());
}
13. 匹配整数
Pattern pattern = Pattern.compile("b-?d+b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {
System.out.println("Integer: " + matcher.group());
}
14. 匹配非负浮点数
Pattern pattern = Pattern.compile("bd+(.d+)?b");
Matcher matcher = pattern.matcher("123.45");
if (matcher.find()) {
System.out.println("Non-negative floating-point number: " + matcher.group());
}
15. 匹配正浮点数
Pattern pattern = Pattern.compile("b[0-9]d*(.d+)?b");
Matcher matcher = pattern.matcher("123.45");
if (matcher.find()) {
System.out.println("Positive floating-point number: " + matcher.group());
}
16. 匹配非正浮点数
Pattern pattern = Pattern.compile("b-(d+(.d+)?)|0(.0+)?b");
Matcher matcher = pattern.matcher("-123.45");
if (matcher.find()) {
System.out.println("Non-positive floating-point number: " + matcher.group());
}
17. 匹配负浮点数
Pattern pattern = Pattern.compile("b-([0-9]d*(.d+)?)b");
Matcher matcher = pattern.matcher("-123.45");
if (matcher.find()) {
System.out.println("Negative floating-point number: " + matcher.group());
}
18. 匹配英文字符串
Pattern pattern = Pattern.compile("[A-Za-z]+");
Matcher matcher = pattern.matcher("HelloWorld");
if (matcher.find()) {
System.out.println("English string: " + matcher.group());
}
19. 匹配英文大写串
Pattern pattern = Pattern.compile("[A-Z]+");
Matcher matcher = pattern.matcher("HELLO");
if (matcher.find()) {
System.out.println("Uppercase English string: " + matcher.group());
}
20. 匹配英文小写串
Pattern pattern = Pattern.compile("[a-z]+");
Matcher matcher = pattern.matcher("hello");
if (matcher.find()) {
System.out.println("Lowercase English string: " + matcher.group());
}
21. 匹配英文字符数字串
Pattern pattern = Pattern.compile("[A-Za-z0-9]+");
Matcher matcher = pattern.matcher("Hello123");
if (matcher.find()) {
System.out.println("Alphanumeric string: " + matcher.group());
}
22. 匹配英数字加下划线串
Pattern pattern = Pattern.compile("w+");
Matcher matcher = pattern.matcher("Hello_123");
if (matcher.find()) {
System.out.println("Alphanumeric string with underscores: " + matcher.group());
}
23. 匹配E-mail地址
Pattern pattern = Pattern.compile("[w.-]+@[w.-]+.[A-Za-z]{2,}");
Matcher matcher = pattern.matcher("user@example.com");
if (matcher.find()) {
System.out.println("Email address: " + matcher.group());
}
24. 匹配URL
Pattern pattern = Pattern.compile("[a-zA-z]+://[^s]*");
Matcher matcher = pattern.matcher("http://www.example.com");
if (matcher.find()) {
System.out.println("URL: " + matcher.group());
}
25. 匹配邮政编码
Pattern pattern = Pattern.compile("bd{5}(?:-d{4})?b");
Matcher matcher = pattern.matcher("12345-6789");
if (matcher.find()) {
System.out.println("Postal code: " + matcher.group());
}
26. 匹配中文
Pattern pattern = Pattern.compile("[u4e00-u9fa5]+");
Matcher matcher = pattern.matcher("这是一段中文");
if (matcher.find()) {
System.out.println("Chinese text: " + matcher.group());
}
27. 匹配电话号码
Pattern pattern = Pattern.compile("bd{3}-d{3}-d{4}b");
Matcher matcher = pattern.matcher("123-456-7890");
if (matcher.find()) {
System.out.println("Phone number: " + matcher.group());
}
28. 匹配手机号码
Pattern pattern = Pattern.compile("b1[34578]d{9}b");
Matcher matcher = pattern.matcher("13812345678");
if (matcher.find()) {
System.out.println("Mobile number: " + matcher.group());
}
29. 匹配双字节字符
Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("测试ABC");
while (matcher.find()) {
System.out.println("Double-byte character: " + matcher.group());
}
30. 匹配首尾空格
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String text = " Hello World! ";
Pattern pattern = Pattern.compile("^s+|s+$");
Matcher matcher = pattern.matcher(text);
// 替换首尾空格
String result = matcher.replaceAll("");
System.out.println("原始字符串: '" + text + "'");
System.out.println("移除首尾空格后: '" + result + "'");
}
}
31. 匹配中文字符
Pattern pattern = Pattern.compile("[u4e00-u9fa5]");
Matcher matcher = pattern.matcher("你好, World!");
while (matcher.find()) {
System.out.println("Matched Chinese character: " + matcher.group());
}
32. 匹配双字节字符(包括汉字在内)
Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("双字节字符测试abc");
while (matcher.find()) {
System.out.println("Matched double-byte character: " + matcher.group());
}
33. 匹配空行
Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("First linennThird line");
while (matcher.find()) {
System.out.println("Matched empty line at index: " + matcher.start());
}
34. 匹配HTML标记
Pattern pattern = Pattern.compile("<(S*?)[^>]*>.*?</1>|<.*? />");
Matcher matcher = pattern.matcher("<html><head></head><body></body></html>");
while (matcher.find()) {
System.out.println("Matched HTML tag: " + matcher.group());
}
35. 匹配首尾空格
Pattern pattern = Pattern.compile("^s+|s+$");
Matcher matcher = pattern.matcher(" Hello World! ");
String result = matcher.replaceAll("");
System.out.println("String after removing leading and trailing spaces: " + result);
36. 匹配IP地址
Pattern pattern = Pattern.compile("b(?:[0-9]{1,3}.){3}[0-9]{1,3}b");
Matcher matcher = pattern.matcher("192.168.1.1 and 10.0.0.1");
while (matcher.find()) {
System.out.println("Matched IP Address: " + matcher.group());
}
37. 匹配Email地址
Pattern pattern = Pattern.compile("[w.-]+@[w.-]+.[a-zA-Z]{2,6}");
Matcher matcher = pattern.matcher("email@example.com");
if (matcher.find()) {
System.out.println("Matched Email: " + matcher.group());
}
38. 匹配网址URL
Pattern pattern = Pattern.compile("http[s]?://[w.]+[/w ./?%&=]*");
Matcher matcher = pattern.matcher("Visit https://www.example.com!");
while (matcher.find()) {
System.out.println("Matched URL: " + matcher.group());
}
最佳实践
- 理解和测试:正则表达式可能很复杂,理解其组成部分和测试它们的行为是非常重要的。
- 性能考虑:正则表达式可能会影响应用程序的性能,尤其是在处理大量文本时。
- 避免过度使用:在某些情况下,简单的字符串操作可能比
复杂的正则表达式更高效。
结语
正则表达式是一种强大的工具,可以在各种字符串处理场景中发挥作用。