This board is NOT for jschan support. You will not get help here.

I have seen your Grid Captcha on another site running with jschan.

I wondered how difficult it is to break with some slightly modified CNN from 1989 and without any preprocessing, except for generating the training and validation sets.

tl;dr: My net solves about 2/3 (62.70%) of all grid captchas correctly at first try. The individual point (one of those 16 of a single captcha) is solved with a probability of 96.68% correctly.

The score can be possible improved a bit by adding more training data (which is easy to generate) or using a more advanced neural net.

Somewhere I read your decision was to create visual captcha, so it can't be easily solved by third party captcha solving servers. Visual captchas can be easily send to any such service, most of them work with all kind of formats, not just text. But at least often visual captchas cost a bit more than text captcahs.

I will not release my source code or model, cause I don't want any boards spammed. But I'll answer question if there are any. Jschan looks btw good and very configurable. Also I think the protections like cost per captcha generation are a good approach.
[Hide] (498.1KB, 702x401) Reverse
>>1544 (OP) 
Very cool. I remember another person from zzzchan doing something similar like ~1 year ago, maybe you can search for their posts. I guess the chess/grid captcha is interesting (annoying) enough for multiple people to want to solve it with neural nets hahaha.
It was devised just to stop people putting text captcha through OCR because even solving a low % of captcha is enough to spam.
The idea is not originally mine, it was adapted from some onion sites as part of a system they called "endgame". They just used filled or unfilled circles in a similar distorted grid arrangement. As far as I know this concept was already broken by AI so I expected that from the start. 
No questions really, besides "how do you make a captcha AI proof with public/open source generator?" but i suspect the answer to that would be worth a lot of $ lol.
[Hide] (26.3KB, 660x400) Reverse
[Hide] (26.3KB, 660x400) Reverse
Thanks for the hint with zzzchan, but looks like it's a bit difficult to find. Interesting how others beat this challenge.

I could also increase the accuracy by 10 percent yesterday: I played around with the loss function and modified it in the way, that it gets a better score when a whole captcha is solved and not just based on "most individual hits correct". Now 3/4 (76.15%) instead of 2/3 (62.70%) captchas are solved correctly.

When I have the motivation I later may test the text captcha with a similar neural net, too. 
Yes, with standard OCR detection you can solve those Captchas often good, I did that when I "broke" my first Captchas of file hosting sites.

>"how do you make a captcha AI proof with public/open source generator?"
Difficult, as with public source everybody can generate as much training material as he'd like. I used for the CNN btw around 38k training examples, without any data augmentation. Would be hard to solve that by myself just for training...
I think a solution would be do offer lots of captcha customization to the admins, so that not every captcha from the same software looks the same. Maybe offer something like a "captcha addon" with a basic captcha and the possibility to add further image processing operations on it.
But even when I would not have the original generation code, it often doesn't look to difficult to build a similar Captcha generator by yourself for training. But of course this is all a bit advanced stuff and not to be expected from any random bypasser.

I think the Captchas which look really hard to solve by AI are the google Recaptchas. When you have a bad (IP) reputation you have to solve like ten different puzzles and a single mistake can break the whole process. Also the puzzle images look heavily altered. The hCaptcha are more easy to solve, but sometimes they deliver you new types of captchas which you would not expect. I had one time unexpectedly a "draw a box around the biggest cat in the picture". I liked that.
[Hide] (9.9KB, 160x160) Reverse
[Hide] (443.2KB, 464x464, 00:12)
I looked on zzz and can't find the posts either. It might have even been on this site from before the migration to zzz. Basically they did something similar with identifying the points, and also image preprocessing to remove the "junk" around the edges of grid captchas.

>offer lots of captcha customization to the admins, so that not every captcha from the same software looks the same
Yes, there is some minor customization like the amount of distortion, the size of the image and whether it is 3x3, 4x4, etc. I like the idea of extra image processing operations. I think an option for custom character sets and/or fonts would be really good for both grid and text captchas. I can add completely new types of captcha too, the code is pretty extensible. Just can't get too fancy because maintaining good UX and compatibility with noscript is important :^)

I like to look at "dread" and their "endgame" system because they are constantly fighting bots and smart people with AI. When I last checked they were using captcha image like video+pic related. The 6 input fields had some CSS that when focused, moves the position of the captcha image as a "background-image" of a small peephole to reveal each character. Still fully noscript compatible. It makes it more difficult to extract what the challenge is (parsing the css or maybe use a headless browser), and they can reduce server load of generating images because the same image can be reused many times with different 6 letter challenges. It's flexible so you could increase the character count for more difficulty, and segment it for tor/lokinet, etc.

>training materials, google and hcaptcha
On the training materials, like you said it is difficult to defend when it is open source. The google recaptcha and hcaptcha are strong because they have almost unlimited data sets and change them regularly. You can train to identify fire hydrants until they start showing fire hydrants from a different country... where they are painted a different color... and add a bunch of noise... because you are using tor... And just when you think you have it, they start asking for crosswalks instead.
But they also innovate on the types of questions and adding more specific requirements. I saw a hcaptcha recently like "pick all the sea planes flying to the right" where there were multiple types of aircraft, flying in different directions. I wouldn't be surprised if google does something like that for "pick all the squares with GREEN traffic lights" hahaha.
Plus with such huge volume, they can learn from the captcha inputs like allowing some incorrectness, giving the scores, etc. Also if they have a bunch of "trusted" users with good IP reputation solving many captchas correctly, they can use those people to solve "dummy" captchas that have no known answer and train their own AI. Because those users will be honest and solve the captcha correctly anyway. (I think google already does this). Like a self sustaining ecosystem. Something like that is quite grand and easily a project of its own.

Thanks for all the input, it's very interesting ๐Ÿ‘
>>1544 (OP) 
>But I'll answer question if there are any
what prog language did you use for AI?
Yeah, those dread Caotchas look really interesting - and great that it's working without JS.
Btw I read that even ReCaptcha is (was?) in a non JS-version available. But it will cost > 1000 USD / month or so, only premium google tier.

I used Python3 with the PyTorch framework. Or frameworks like TensorFlow or Keras should also work. But PyTorch is very straight-forward and flexible.
[Hide] (5.6KB, 210x80) Reverse
[Hide] (4.7KB, 210x80) Reverse
[Hide] (4.8KB, 210x80) Reverse
[Hide] (4.6KB, 210x80) Reverse
While generating the training set for my text captcha solver, I think I stumbled upon a bug, which affects captcha solving.

In rare occurrences the captcha solution, which gets saved into the database, is not exactly the characters shown in the captcha images. In all cases I found the last letter, when it's a "u", is missing in the solution string. But there are other occurrences where the last letter is a "u" and is also correctly stored into the database.

In case of the missing last letter, the captcha is unsolvable for the user, cause he doesn't expect an incomplete solution.

In a set of 100 generated captchas, there are two such entries.
Here four examples, two are correct, two are wrong.

See the related generated images for comparison. The solution is the "const text" from text.js.
Thanks. Will fix soon. https://gitgud.io/fatchan/jschan/-/issues/464
The "u" is the start of "undefined" i think KEK

Also, I showed this thread to another jschan site admin. We brainstormed a bit and have some more cool ideas for adapting and improving the captchas. I will make it a higher priority for upcoming versions.
I read some of the posts and I think the inevitable conclusion is: trying to make unbreakable captcha is more punishing to legitimate users than it is for spammers, hence proof of work, where you create a de facto account and can ban that.
Replies: >>1553
It is a losing battle re captcha. But OP makes a good point that you don't have to make an unbeatable captcha. Making it customisable to the point that every site running jschan or lynxchan isn't beaten by the same solver is a good improvement. The customization does not necessarily mean making it more difficult, but more different.

Because while proof of work fights the lone spammer, it is relatively weak against somebody with a botnet or powerful hardware. POW solving is automated by design,  so you must use captcha in tandem with POW. If your captcha is also blanket automated by 1 solver like described above, you haven't done much to fix the problem. Sitting next to me right now I have 2x 3900x, 1x 5900x computers, and even my laptop is pretty beefy. Are you really going to increase the difficulty so much that this would not be enough to cause spam on the average site?

Also, if you increase the work required so much that users must wait several minutes to post, legitimate users would rather not post at all. It's why I will never use kohlchan. And any concession you give to accommodate mobile phones or older computers will be used by spammers. Imo, you are better running a monero node and forcing users to mine a low difficulty share, so you at least don't waste all that CPU power... lol. Then spammers are even paying you for every spam they make.

Still, I do have a proof of work system, but it is implemented as part of haproxy lua script before visitors can even view the page. https://kikeflare.com (not part of jschan itself)
[Hide] (343.3KB, 480x432, 00:16)
Imagine you let admins choose:
What should be selected? ...
What should not be selected? ...
Write the question: "Pick the squares with arrows pointing left"
The captcha now looks like pic related

This one wont be beaten by the AI trained by OP. He will have to figure out the characters himself and then start over. It is easier to come up with a new question than is is to train the AI to beat it.
Last edited by admin
>The "u" is the start of "undefined" i think KEK
That explains it good. I already wondered what the black blob next to the last "u" in 629270d9a38c0bccdd79dcfc.jpg is. Seems to be the "n" of undefined :DD

I did some training with the text captchas on the a similar CNN like in OP. I modified it, that it doesn't predict only one output with a vector of 16 sigmoids, but 6 outputs with a vector of size 36 with softmax (so it predicts only one letter per output). It's a naive approach from scratch. My first conclusion after a few hours of training: It's way more difficult then grid captcha. The solution space is not only 2^16 but 36^6, and I barely had any success in the training time, but at least after nearly 100 epochs the network still improved. But I got no time to train it for days and weeks, so I canceled it now.

I think using an already pretrained net for character detection as foundation would help to speed up the process. When building neural nets you don't have to start at zero (like I did with grid), but you can also make use of other models and model architectures. You can for example get some image classification net and add a new layer to the outputs. Also you can freeze any layers of the neural net you don't want to train in the process. Maybe this is the next thing I'm going to do.

I think an issue with the grid captcha is, that it's very "simple". The net only has to detect certain areas of interest and map them to 0 or 1. This could be done in a very small net, since "it just maps pixels" from a bigger array to a smaller array. Also the solution space is rather narrow. As idea: Maybe try to raise the complexity... increasing the grid size or moving from binary decisions (mark all black letters) to more complex ones like "type all black letters in the grid".
Replies: >>1560
Will be waiting for this on next fatchan update
IIRC when it gets to that point, people just use paid humans to solve captchas. I think someone was using something like that to spam a lynxchan site, that was when I gave up leaning on captchas for security.
Replies: >>1558
If they paid, that means the captcha is good enough hahahaha
That vid you got there got me thinking it might be good if you had it so you had to pick em in a specific order and asked, for example, player two does a dragon punch (left, down, downleft). That's a specific example that i don't think would be great, not that many people overall would know what that means. Figure it illustrates an idea that might be not shit.
Replies: >>1560
>moving from binary decisions (mark all black letters) to more complex ones like "type all black letters in the grid".
Right now it doesn't use characters on a normal keyboard, but this sounds like it could work for a harder text captcha.

>you had it so you had to pick em in a specific order and asked, for example, player two does a dragon punch (left, down, downleft).
Ordering might make it harder if you add multiple questions, but I think it still had the problem OP says:
>The net only has to detect certain areas of interest and map them to 0 or 1. This could be done in a very small net, since "it just maps pixels" from a bigger array to a smaller array. Also the solution space is rather narrow. 
Also, I don't want anything that requires a specific knowledge like what the dragon punch button combo is.
I think creating a dependency between squares could make it stronger, such that the AI cant solve it by only analysing individual squares, but must account for other parts of the image. Maybe the grid could have a few symbols and the rest arrows, and it says "pick all arrows pointing at symbol X".
Replies: >>1561
>"pick all arrows pointing at symbol X"
That sounds interesting. Will definitely throw my neural net on it to see how it performs!
Replies: >>1572
[Hide] (35KB, 364x517) Reverse
Tracking, https://gitgud.io/fatchan/jschan/-/issues/469
Will update again when there is progress re the dread mode, or my "arrows pointing at x" idea.
Replies: >>1572
[Hide] (21.4KB, 452x320) Reverse
[Hide] (22.9KB, 425x305) Reverse
[Hide] (20.1KB, 437x291) Reverse
Update, "grid v2" in development. Here are some samples with the answer filled in. (noise/edge effects off for clearer illustration)
For regular text captcha, there should be option to set max captcha length, right now the hardcoded is 6, a input box for text captcha options could change that. also there needs to be a limit on how much maximum characters can be shown on text captcha image.
Replies: >>1576
It's ultimately impossible to win with captchas. The future of chans is mandatory pass regime. You get a posting token for a small monero fee. Allows anonymous networks, allows no ads forever.
Replies: >>1576
[Hide] (42.1KB, 532x457) Reverse
saw this captcha earlier today and thought about this thread so here it is
Replies: >>1576
This could be an option in future. The next version includes a permission to bypass captchas, so its possible to make an integration where paying a fee will grant this permission. Captcha on its own is not the only tool we have, there is always a multi-facet approach.

Thank you for your inputs. Merge requests are always open for you to implement these new captcha types on yourself. The captcha improvements for 0.8.x are already finalised. If I add another new captcha type myself, it will more than likely be the "peephole" or clock captcha, inspired from some popular onion sites. You can see an example of the peephole https://gitgud.io/fatchan/haproxy-protection/-/issues/8.
0.8.x includes a new captcha type and improved customisation of existing captchas. The new captcha type tries to improve on a weakness of grid v1 discussed above, and the new customisation options will allow jschan instances to use have unique captchas from eachother to reduce the impact and likelihood of a blanket captcha solver.
Replies: >>1577
>Thank you for your inputs. Merge requests are always open for you to implement these new captcha types on yourself.
>new captcha types
i did not mean new captcha, but rather a little improvement for the text captcha to change the character limit in global settings.
[Hide] (11.5KB, 222x104) Reverse
Nice, the length should also be changeable in global captcha settings
I know you don't like this but I'm not opening up irc to say it
What you did with the board settings page is very nice, good looking, sleek, and well organized.
