🔍Regex Recipes
Extract Hashtags
Extract social media hashtags with Unicode support for international characters.
Pattern
#[\w\u00C0-\u017F]+Explanation
Matches # followed by word characters and common accented characters. Add more Unicode ranges for other scripts.
Examples
Simple
Input
#javascript
Output
✓ Match
With numbers
Input
#web3
Output
✓ Match
Multiple
Input
Love #coding and #developer
Output
Matches both
With accents
Input
#café
Output
✓ Match (with Unicode range)
Not hashtag
Input
cost is #50
Output
✓ Match (but may be false positive)
Code Examples
JavaScript
// Basic hashtag extraction
const text = 'Loving #JavaScript and #WebDev! #coding';
const hashtags = text.match(/#[\w\u00C0-\u017F]+/g)
.map(tag => tag.substring(1)); // Remove #
// Result: ['JavaScript', 'WebDev', 'coding']
// More Unicode support (emoji, CJK, etc.)
const unicodeHashtags = /#[\p{L}\p{N}_]+/gu;
const tags = text.match(unicodeHashtags);Try it Now
💡 Tips
- Use \p{L} in modern regex for full Unicode support
- Platform rules: Twitter allows letters/numbers, no spaces
- Consider max length (often 100-140 chars)
- Normalize to lowercase for comparison
⚠️ Common Pitfalls
- May match # in non-hashtag contexts
- Different platforms have different rules
- Unicode support varies by regex engine
- May need word boundaries for precise matching