I may have run into a bug...
When I run a certain regexp in CreateSearchUrl, "Find Matches" returns correct \1 and \2, but when run in the debugger, no matches are found. This depends on the presence of a certain search group, i.e, " \(HQ\)". The details:
Input / Title to scrape:
Code:
The Palm Beach Story (HQ) arte 2015-02-16 20h15.mov
Regexp with marker:
Code:
(.+)( \(HQ\).*\.[a-zA-Z0-9]{3})
Returns of "Find Matches" for this regexp:
Code:
\1 = "The Palm Beach Story"
\2 = " (HQ) arte 2015-02-16 20h15.mov"
Regexp without marker:
Code:
(.+)(\.[a-zA-Z0-9]{3})
Returns of "Find Matches" for this regexp:
Code:
\1 = "The Palm Beach Story (HQ) arte 2015-02-16 20h15"
\2 = ".mov"
Debugger Log, running both regexps:
Code:
7:41:44.085 [INFO] Entering Function: CreateSearchUrl
17:41:44.491 [FINE] RUNNING
17:41:44.492 [INFO] Executing RegExp: (.+)(\.[a-zA-Z0-9]{3})
17:41:44.900 [FINE] Loading variable = 1
17:41:44.901 [FINER] executing expression
17:41:44.902 [FINER] Match found = 0 - 55
17:41:44.904 [FINER] Match = \1
17:41:44.906 [FINE] RUNNING
17:41:44.907 [INFO] Executing RegExp: (.+)( \(HQ\).*\.[a-zA-Z0-9]{3})
17:41:45.317 [FINE] Loading variable = 1
17:41:45.319 [FINER] executing expression
17:41:45.323 [FINE] RUNNING
17:41:45.324 [INFO] Leaving Function: CreateSearchUrl
The debugger does not return a \1 on the regexp with the marker. Testing with \2: the same.
For \1 in the regexp without the marker, even with "URL encode"
unchecked, the "()" have been replaced with "%28" and "%29", and the spaces with "+" (can be seen in the return variable, $$x). The total input is 51 characters long, the regexp without marker returns characters 1-55, so maybe the replacement happened before the regexp match, thus spoiling it?
Did I overlook something crucial? Please help.
y
P.S.:
ScraperEdit v0.1.2-66
Java 8-Update 31
Mac OS 10.10.2