No relation to the sports channel.

  • 6 Posts
  • 1.17K Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle







  • Okay, let’s skip the formal logic talk then and go straight to linguistics.

    The question “Good to merge?” does not contain a grammatical error. It is perfectly well-formed by the grammar that native English speakers actually follow in everyday communication. A grammar that fails to parse “Good to merge?” in context cannot parse native English speakers’ actual output.

    Schoolbook English is not native English, because it’s not how native English speakers actually speak. Schoolbook English contains rules that directly contradict native English speakers’ everyday usage.

    (Standard examples include the rule against split infinitives and the rule against ending a sentence with a preposition. These are not grammatical rules of English as it is spoken by native speakers. To boldly assert them is silliness up with which I will not put.)


  • The guideline (as applied) contains a contradiction, so the principle of explosion applies.

    Specifically, there is a contradiction between “native-sounding English” and “no grammatical errors”, when the latter phrase is interpreted in the manner seen here. Native speakers quite often use sentence fragments and in other ways do not follow schoolbook “proper grammar”. In fact, second-language learners often use schoolbook grammar where a native speaker would use a more relaxed register.

    Since the guideline contains a contradiction, it is either impossible to follow (i.e. forbids all communication whatsoever) or impossible to violate (i.e. forbids no communication).



  • The answer given in the spoiler tag is not quite correct!

    Test case

    According to the spoiler, this shouldn’t match “abab”, but it does.

    Corrected regex

    This will match what the spoiler says: ^.?$|^((.)\2+?)\1+$

    Full workup

    Any Perl-compatible regex can be parsed into a syntax tree using the Common Lisp package CL-PPCRE. So if you already know Common Lisp, you don’t need to learn regex syntax too!

    So let’s put the original regex into CL-PPCRE’s parser. (Note, we have to add a backslash to escape the backslash in the string.) The parser will turn the regex notation into a nice pretty S-expression.

    > (cl-ppcre:parse-string "^.?$|^(..+?)\\1+$")
    (:ALTERNATION
     (:SEQUENCE :START-ANCHOR (:GREEDY-REPETITION 0 1 :EVERYTHING) :END-ANCHOR)
     (:SEQUENCE :START-ANCHOR
      (:REGISTER
       (:SEQUENCE :EVERYTHING (:NON-GREEDY-REPETITION 1 NIL :EVERYTHING)))
      (:GREEDY-REPETITION 1 NIL (:BACK-REFERENCE 1)) :END-ANCHOR))
    

    At which point we can tell it’s tricky because there’s a capturing register using a non-greedy repetition. (That’s the \1 and the +? in the original.)

    The top level is an alternation (the | in the original) and the first branch is pretty simple: it’s just zero or one of any character.

    The second branch is the fun one. It’s looking for two or more repetitions of the captured group, which is itself two or more characters. So, for instance, “aaaa”, or “abcabc”, or “abbaabba”, but not “aaaaa” or “abba”.

    So strings that this matches will be of non-prime length: zero, one, or a multiple of two numbers 2 or greater.

    But it is not true that it matches only “any character repeated a non-prime number of times” because it also matches composite-length sequences formed by repeating a string of different characters, like “abcabc”.

    If we actually want what the spoiler says — only non-prime repetitions of a single character — then we need to use a second capturing register inside the first. This gives us:

    ^.?$|^((.)\2+?)\1+$.

    Specifically, this replaces (..+?) with ((.)\2+?). The \2 matches the character captured by (.), so the whole regex now needs to see the same character throughout.


  • If DNS is transiently down, the most common mail domains are still in local resolver cache. And if you’re parsing live user requests, that means the IP network itself is not in transient failure at the moment. So it takes a pretty narrow kind of failure to trigger a problem… And the outcome is the app tells the user to recheck their email address, they do, and they retry and it works.

    If DNS is having a worse problem, it’s probably down for your mail server too, which means an email would at least sit in the outbound mail spool for a bit until DNS comes back. Meanwhile the user is wondering where their confirmation email is, because people expect email delivery in seconds these days.

    So yeah … yay, tradeoffs!

    (Confirmation emails are still important for closed-loop opt-in, to make sure the user isn’t signing someone else up for your marketing department’s spam, though.)


  • fubo@lemmy.worldtoProgramming@programming.devStrings do too many things
    link
    fedilink
    arrow-up
    10
    arrow-down
    2
    ·
    edit-2
    2 years ago

    The only way to correctly validate an email address is to send a message to it, and verify that it arrived.

    If you’re accepting email addresses as user input (e.g. from a web form), it might be nice to check that what’s to the right of the rightmost @ sign is a domain name with an MX or A record. That way, if a user enters a typo’d address, you have some chance of telling them that instead of handing an email to user#example.net or user@gmailc.om to your MTA.

    But the validity of the local-part (left of the rightmost @) is up to the receiving server.



  • fubo@lemmy.worldtoProgramming@programming.devStrings do too many things
    link
    fedilink
    arrow-up
    60
    arrow-down
    1
    ·
    edit-2
    2 years ago

    Any time you’re turning a string of input into something else, what you are doing is parsing.

    Even if the word “parser” never appears in your code, the act of interpreting a string as structured data is parsing, and the code that does parsing is a parser.

    Programmers write parsers quite a lot, and many of the parsers they write are ad-hoc, ill-specified, bug-ridden, and can’t tell you why your input didn’t parse right.

    Writing a parser without realizing you’re writing a parser, usually leads to writing a bad parser. Bad parsers do things like accepting malformed input that causes security holes. When bad parsers do reject malformed input, they rarely emit useful error messages about why it’s malformed. Bad parsers are often written using regex and duct tape.

    Try not to write bad parsers. If you need to parse something, consider writing a grammar and using a parser library. (If you’re very ambitious, try a parser combinator library.) But at least try to recall something about parsers you learned once way back in a CS class, before throwing regex at the problem and calling it a day.

    (And now the word “parser” no longer makes sense, because of semantic satiation.)

    By the way, please don’t write regex to try to validate email addresses. Seriously. There are libraries for that; some of them are even good. When people write their own regex to match email addresses, they do things like forget that the hyphen is a valid character in domain names.





  • It’s safe to look things up!

    Looking up the name of a crime does not mean that you’re doing that crime.

    If you look up “bank robbery” that doesn’t make you guilty of bank robbery. It doesn’t even mean you’re trying to rob a bank, or even want to rob a bank. You could want to know how bank robbers work. You could be interested in being a bank guard or security engineer. You could be thinking of writing a heist story. You could want to know how safe your money is in a bank: do they get robbed all the time, or not?

    Please, folks, don’t be afraid to look up words. That’s how you learn stuff.