R extract the first pattern from the end of string

R extract the first pattern from the end of string



I want to extract sizes from strings, which can be:


a <- c("xxxxxxx 2.5 oz (23488)",
"xxxxx /1.36oz",
"xxxxx/7 days /20 ml")



Result I want: 2.5 oz /1.36oz /20 ml


2.5 oz /1.36oz /20 ml



Because strings varies, so I want to extract patterns backward. That is, I want to extract the first appearance of \/*(\d+\.*\d*)\s*[[:alpha:]]+ from the end of the string. It will avoid R from taking 23488 from the first string and /7 days from the third string.


\/*(\d+\.*\d*)\s*[[:alpha:]]+


23488


/7 days



Anyone knows how I can achieve this?
Thanks!





An idea is to put .* before to consume. Something like ^.*[ /]b([d.]+s*[a-z]+)
– bobble bubble
Aug 20 at 22:04


.*


^.*[ /]b([d.]+s*[a-z]+)





won't it capture everything up to and including size?
– Mr369
Aug 20 at 22:06





@Mr369 yes it will. For the result you're looking for, just refer to capture group 1
– emsimpson92
Aug 20 at 22:07





@emsimpson92 You changed the regex and it won't match, say, 40 l now.
– Wiktor Stribiżew
Aug 20 at 22:19


40 l





@Mr369 Without capturing group: ^.*[ /]bK[d.]+s*[a-z]+ (with perl=true).
– bobble bubble
Aug 20 at 22:20



^.*[ /]bK[d.]+s*[a-z]+


perl=true




2 Answers
2



You may use


> a <- c("xxxxxxx 2.5 oz (23488)",
+ "xxxxx /1.36oz",
+ "xxxxx/7 days /20 ml")
> regmatches(a, regexpr("/?\d+(?:\.\d+)?\s*\pL+(?!.*\d(?:\.\d+)?\s*\pL+)", a, perl=TRUE))
[1] "2.5 oz" "/1.36oz" "/20 ml"



See the regex demo.



Details


/?


/


\d+


(?:\.\d+)?


.


\s*


\pL+


(?!.*\d(?:\.\d+)?\s*\pL+)


.*


\d


(?:\.\d+)?


.


\s*


\pL+





Thanks so much for your answer! while how about the case where the string is "xxxxx 08/21/1.38 Oz xxx 08/21/18 xxx 08/21/18" ? Using the regexp will result 18 xxx instead.
– Mr369
Aug 21 at 17:53


"xxxxx 08/21/1.38 Oz xxx 08/21/18 xxx 08/21/18"


18 xxx





@Mr369 Then, as per your specs, you should better extract all matches and then grab only the last occurrences, see this demo.
– Wiktor Stribiżew
Aug 21 at 19:35



If you know the name of the units(oz, ml, etc), you could try something like this:



((d*|d*.d0,2)s?(ml|oz|etc))


((d*|d*.d0,2)s?(ml|oz|etc))



See working example.





FYI: This requires the knowledge of all measurement units.
– Wiktor Stribiżew
Aug 20 at 22:20






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

ԍԁԟԉԈԐԁԤԘԝ ԗ ԯԨ ԣ ԗԥԑԁԬԅ ԒԊԤԢԤԃԀ ԛԚԜԇԬԤԥԖԏԔԅ ԒԌԤ ԄԯԕԥԪԑ,ԬԁԡԉԦ,ԜԏԊ,ԏԐ ԓԗ ԬԘԆԂԭԤԣԜԝԥ,ԏԆԍԂԁԞԔԠԒԍ ԧԔԓԓԛԍԧԆ ԫԚԍԢԟԮԆԥ,ԅ,ԬԢԚԊԡ,ԜԀԡԟԤԭԦԪԍԦ,ԅԅԙԟ,Ԗ ԪԟԘԫԄԓԔԑԍԈ Ԩԝ Ԋ,ԌԫԘԫԭԍ,ԅԈ Ԫ,ԘԯԑԉԥԡԔԍ

How to change the default border color of fbox? [duplicate]

Avoiding race conditions in Kotlin, Smartcast is impossible runtime exception