I have different technical questions I want to ask. I went to Stack Overflow and they blocked registration with a VPN. It’s really fucking annoying. I can buy a residential IP to bypass this, but I’d rather just not use these enshitified platforms that are so hostile to VPNs.

Is there any decent alternative to Stack Overflow? I have tried getting AI answers to the technical question but they are not good.

And no, I can’t just create a github ID using VPN to login, they block github logins based on IP also.

  • someone@lemmy.todayOP
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    1 day ago

    I am trying to run an OCR program for handwriting to process some large PDFs of old journals that are scanned into PDF. Doing it by hand will take a very long time. I have a amd gpu and have rocm installed. I tried to configure pip with rocm and failed. I was considering pulling a docker of PyTorth and then configuring gradio in it, then trying to get gradio to run TrOCR. I have never run gradio. I have “easier” LLM programs like LM Studio and Ollama but I don’t know if they can run TrOCR. There is AMD documentation on running OCR (https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/inference/ocr_vllm.html) but it’s not clear if it works well with handwriting. TrOCR is just trained for handwriting. It’s also on huggingface, which i don’t know how to use that well.

    • jdr8@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      24 hours ago

      Ok excellent.

      Let’s go step by step.

      You say you tried to configure pip but failed.

      What was the error? Any logs? Did you follow the steps from the link you provided?

      • someone@lemmy.todayOP
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        24 hours ago

        I don’t remember exactly, but I have rocm 7.2 installed, and there was something I was trying to install inside pip for rocm and it just wouldn’t work, it was like 7.2 rocm wasn’t out or the link didn’t work. The LLM tried multiple suggestion and they all failed, then I gave up. When I said “inside” pip, I don’t know if that’s accurate. I am very knew to pip and am decent at linux and only know a small amount of coding and lack python familiarity.

        • jdr8@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          23 hours ago

          So if you follow the instructions from the link again, can you make it work?

          • someone@lemmy.todayOP
            link
            fedilink
            arrow-up
            0
            arrow-down
            1
            ·
            edit-2
            23 hours ago

            that’s not for TrOCR, it’s just for OCR, which may not work for handwriting

            I did try some of the GPT steps:

            pip install --upgrade transformers pillow pdf2image
            
            

            getting some errors:

            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 3/4 [transformers]  WARNING: The scripts transformers and transformers-cli are installed in '/home/user/.local/bin' which is not on PATH.
              Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
            ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
            mistral-common 1.5.2 requires pillow<11.0.0,>=10.3.0, but you have pillow 12.1.0 which is incompatible.
            moviepy 2.1.2 requires pillow<11.0,>=9.2.0, but you have pillow 12.1.0 which is incompatible.
            
            
            

            this is what GPT said to run, but it makes no sense because I don’t have TrOCR even downloaded or running at all.

            Install packages: pip install --upgrade transformers pillow pdf2image
            Ensure poppler is installed:
            
            Ubuntu/Debian: sudo apt install -y poppler-utils
            macOS: brew install poppler
            
            Execute: python3 trocr_pdf.py input.pdf output.txt
            

            That’s the script to save and run.

            #!/usr/bin/env python3
            import sys
            from pdf2image import convert_from_path
            from PIL import Image
            import torch
            from transformers import TrOCRProcessor, VisionEncoderDecoderModel
            
            def main(pdf_path, out_path="output.txt", dpi=300):
                device = "cuda" if torch.cuda.is_available() else "cpu"
                model_name = "microsoft/trocr-base-handwritten"
                processor = TrOCRProcessor.from_pretrained(model_name)
                model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)
            
                pages = convert_from_path(pdf_path, dpi=dpi)
                results = []
                for i, page in enumerate(pages, 1):
                    page = page.convert("RGB")
                    # downscale if very large to avoid OOM
                    max_dim = 1600
                    if max(page.width, page.height) > max_dim:
                        scale = max_dim / max(page.width, page.height)
                        page = page.resize((int(page.width*scale), int(page.height*scale)), Image.Resampling.LANCZOS)
            
                    pixel_values = processor(images=page, return_tensors="pt").pixel_values.to(device)
                    generated_ids = model.generate(pixel_values, max_length=512)
                    text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
                    results.append(f"--- Page {i} ---\n{text.strip()}\n")
            
                with open(out_path, "w", encoding="utf-8") as f:
                    f.write("\n".join(results))
                print(f"Saved OCR text to {out_path}")
            
            if __name__ == "__main__":
                if len(sys.argv) < 2:
                    print("Usage: python3 trocr_pdf.py input.pdf [output.txt]")
                    sys.exit(1)
                pdf_path = sys.argv[1]
                out_path = sys.argv[2] if len(sys.argv) > 2 else "output.txt"
                main(pdf_path, out_path)
            
            
            • jdr8@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              23 hours ago

              Ok so from the error, you have a version of pillow that is incompatible.

              You have to downgrade pillow to version 11.

              That’s the first step.

              • someone@lemmy.todayOP
                link
                fedilink
                arrow-up
                0
                arrow-down
                1
                ·
                edit-2
                23 hours ago

                I don’t trust big tech to not extract data and metadata and save it. Many companies get served with government requests to save data and keep it secret. Even if handwritingocr.com doesn’t have such an agreement, it could run on AWS and that has an agreement. I would much rather do this locally. Some of the writings are confidential. Handwritingocr.com says data is encrypted in transit and at rest, but it’s not open source and even if it were I can’t verify the server code.

                also Tesseract is CPU only, right? It will be so slow.

                • jdr8@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  22 hours ago

                  Fair point.

                  So what about Tensor flow and some local LLM to do the job?

                  You just need to find a reliable LLM in HuggingFace, for example.

            • someone@lemmy.todayOP
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              23 hours ago

              Terminal error after running GPT code:

              `python3 trocr_pdf.py small.pdf output.txt Traceback (most recent call last): File “/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py”, line 479, in cached_files hf_hub_download( File “/home/user/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py”, line 114, in _inner_fn return fn(*args, **kwargs) File “/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py”, line 1007, in hf_hub_download return _hf_hub_download_to_cache_dir( File “/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py”, line 1124, in _hf_hub_download_to_cache_dir os.makedirs(os.path.dirname(blob_path), exist_ok=True) File “/usr/lib/python3.10/os.py”, line 215, in makedirs makedirs(head, exist_ok=exist_ok) File “/usr/lib/python3.10/os.py”, line 225, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: ‘/home/user/.cache/huggingface/hub/models–microsoft–trocr-base-handwritten’

              The above exception was the direct cause of the following exception:

              Traceback (most recent call last): File “/home/user/Documents/trocr_pdf.py”, line 39, in <module> main(pdf_path, out_path) File “/home/user/Documents/trocr_pdf.py”, line 11, in main processor = TrOCRProcessor.from_pretrained(model_name) File “/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py”, line 1394, in from_pretrained args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs) File “/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py”, line 1453, in _get_arguments_from_pretrained args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs)) File “/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py”, line 489, in from_pretrained raise initial_exception File “/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py”, line 476, in from_pretrained config_dict, _ = ImageProcessingMixin.get_image_processor_dict( File “/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py”, line 333, in get_image_processor_dict resolved_image_processor_files = [ File “/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py”, line 337, in <listcomp> resolved_file := cached_file( File “/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py”, line 322, in cached_file file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs) File “/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py”, line 524, in cached_files raise OSError( OSError: PermissionError at /home/user/.cache/huggingface/hub/models–microsoft–trocr-base-handwritten when downloading microsoft/trocr-base-handwritten. Check cache directory permissions. Common causes: 1) another user is downloading the same model (please wait); 2) a previous download was canceled and the lock file needs manual removal. `

              • pinball_wizard@lemmy.zip
                link
                fedilink
                arrow-up
                0
                ·
                edit-2
                23 hours ago

                This error looks like it is saying a previous attempt aborted, and it needs you to clean up some file that was only partly downloaded.

                Edit: The “please wait” makes me think I would try again in a couple hours.

                • someone@lemmy.todayOP
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  23 hours ago

                  So try again… in a couple of hours…

                  Why would that make a difference? It’s a local model right?

                  • pinball_wizard@lemmy.zip
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    21 hours ago

                    If it is local only, then waiting probably won’t help.

                    Another thought for you: pip behaves much better inside a virtual environment - using the Python venv module, or uv.

                    The instructions you have shared so far look more compatible with venv.