{"id":355,"date":"2023-05-20T23:49:03","date_gmt":"2023-05-20T23:49:03","guid":{"rendered":"https:\/\/campusvirtual.net\/escritos\/?p=355"},"modified":"2024-06-28T19:24:02","modified_gmt":"2024-06-28T19:24:02","slug":"ocr-archivos-pdf-con-capacidad-de-busqueda","status":"publish","type":"post","link":"https:\/\/campusvirtual.net\/escritos\/2023\/05\/20\/ocr-archivos-pdf-con-capacidad-de-busqueda\/","title":{"rendered":"OCR. Archivos PDF con capacidad de b\u00fasqueda"},"content":{"rendered":"<div class=\"super-content-wrapper\">\n<div class=\"super-content max-width has-footer\">\n<article id=\"block-ocr-archivos-pdf-con-capacidad-de-bsqueda-1\" class=\"notion-root\">\n<div id=\"block-94cf745a46b449a28554c2da414960e9\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Para trabajar con Chatbots de IA y con programas como Zotero,\u00a0<\/strong><\/span><span class=\"notion-semantic-string\"><strong> es conveniente que los archivos PDF\u00a0 tengan \u201ccapacidad de b\u00fasqueda\u201d de texto.<\/strong><\/span><\/p>\n<\/div>\n<div id=\"block-04945204f8264ffb8e6a3b8ee2857de3\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">Es frecuente que los archivos PDF de documentos, libros, escritos, solo contengan im\u00e1genes de los documentos escaneados. Es el caso de los expedientes electr\u00f3nicos del SISE.<\/span><\/p>\n<\/div>\n<div id=\"block-51817cbd2f344ef997df7031c8cb3d85\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">Si bien se puede leer esos documentos PDF en Zotero, como son solo imagen, no es posible realizar b\u00fasquedas, seleccionar y copiar texto.<\/span><\/p>\n<\/div>\n<div id=\"block-363eeb1691a246cb9412eeb61a5f4ed3\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">Lo ideal es realizar <strong>Reconocimiento \u00d3ptico de Caracteres (OCR)<\/strong> para obtener un <strong>PDF con texto seleccionable, copiable y con capacidad de b\u00fasqueda.<\/strong><\/span><\/p>\n<\/div>\n<div id=\"block-b6f0e3565f08469a8c2ca3e51593b110\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">El OCR en un PDF no altera las im\u00e1genes, el texto queda \u201cincrustado\u201d en el PDF como una capa invisible.<\/span><\/p>\n<p><strong>Adicionalmente, para trabajar con Chatbots IA en l\u00ednea\u00a0 (Claude, ChatGPT, Gemini etc) con documentos que contienen datos personales o confidenciales, se deben eliminar ese tipo de datos. \u00bfPor qu\u00e9? Porque se pierde el control de los archivos subidos y no se pueden eliminar; adem\u00e1s las empresas pueden esos datos para entrenar sus modelos.\u00a0<\/strong><\/p>\n<p>Para eliminar datos personales en documentos PDF se deben editar las imagenes antes de realizar el OCR en los mismos Chatbots, o bien, realizar OCR del documento\u00a0 y eliminar las palabras, o frases de datos personales.<\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<div id=\"block-e900e094fd624b9b93771f8cc40e5aad\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<div id=\"block-c85da230823a4449ba2837f437332f1a\" class=\"notion-text\">\n<h3 class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Diagrama del proceso OCR:<\/strong><\/span><\/h3>\n<\/div>\n<div id=\"block-6017942c5c084967bece012379b03662\" class=\"notion-image normal\"><img alt=\"\" aria-hidden=\"true\" \/><img decoding=\"async\" src=\"https:\/\/images.spr.so\/cdn-cgi\/imagedelivery\/j42No7y-dcokJuNgXeA0ig\/ebf1d9e2-c573-4ccc-b40f-60b5cf5ec879\/Untitled\/w=1080,quality=80,fit=scale-down\" srcset=\"https:\/\/images.spr.so\/cdn-cgi\/imagedelivery\/j42No7y-dcokJuNgXeA0ig\/ebf1d9e2-c573-4ccc-b40f-60b5cf5ec879\/Untitled\/w=640,quality=80,fit=scale-down 1x, https:\/\/images.spr.so\/cdn-cgi\/imagedelivery\/j42No7y-dcokJuNgXeA0ig\/ebf1d9e2-c573-4ccc-b40f-60b5cf5ec879\/Untitled\/w=1080,quality=80,fit=scale-down 2x\" alt=\"image\" data-nimg=\"intrinsic\" \/><\/div>\n<h1 id=\"block-74317cdeb5f0487097e4d4015d6c2ff8\" class=\"notion-heading\"><span id=\"74317cdeb5f0487097e4d4015d6c2ff8\" class=\"notion-heading__anchor\"><\/span><span class=\"notion-semantic-string\">SOFTWARE OCR PARA PDF<\/span><\/h1>\n<div id=\"block-c2b0b25e53fe4fad90d200156bae68ef\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<div id=\"block-0840cb713a4446a296ffb41a6540e4a6\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">En los siguientes videos (de terceros) se explica en qu\u00e9 consiste el OCR, y las distintas alternativas de software que realizan OCR.<\/span><\/p>\n<\/div>\n<div id=\"block-f2f8aaed1b9141519e97f47c19d21765\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<h2 id=\"block-492763150f374654bfe3cfac7dbf90e3\" class=\"notion-heading\"><span id=\"492763150f374654bfe3cfac7dbf90e3\" class=\"notion-heading__anchor\"><\/span><span class=\"notion-semantic-string\"><span class=\"highlighted-color color-red\">Programas PDF OCR profesionales<\/span><\/span><\/h2>\n<div id=\"block-e2d448d057f949dfa59e1d2482247e82\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Pros<\/strong>: OCR de alta calidad; velocidad de OCR; capacidad realizar proceso OCR en muchos archivos al mismo tiempo; seguridad. <\/span><\/p>\n<\/div>\n<div id=\"block-27bdd648b5944b68a8925d984850b1c3\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Contras<\/strong>: Precio<\/span><\/p>\n<\/div>\n<div class=\"notion-table__wrapper\">\n<table class=\"notion-table\">\n<tbody>\n<tr>\n<td>\n<div class=\"notion-table__cell\"><span class=\"notion-semantic-string\">ABBYY FineReader PDF<\/span><\/div>\n<\/td>\n<td>\n<div class=\"notion-table__cell\"><span class=\"notion-semantic-string\">MX$ 3,300 al a\u00f1o<\/span><\/div>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<div class=\"notion-table__cell\"><span class=\"notion-semantic-string\">Adobe Acrobat Pro<\/span><\/div>\n<\/td>\n<td>\n<div class=\"notion-table__cell\"><span class=\"notion-semantic-string\">MX$ 6,600 al a\u00f1o<\/span><\/div>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<div class=\"notion-table__cell\"><span class=\"notion-semantic-string\"><strong>Power PDF<\/strong> kofax.com<\/span><\/div>\n<\/td>\n<td>\n<div class=\"notion-table__cell\"><span class=\"notion-semantic-string\">MX$ 2,450 una sola vez<\/span><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div id=\"block-f78c77b05b144939b0f6733b694b5701\" class=\"notion-text\">\n<p>&nbsp;<\/p>\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Un programa eficiente es Power PDF<\/strong><\/span><\/p>\n<\/div>\n<div id=\"block-c08f0f526cad4a6689a089a16b0dd682\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">Puedes descargar una versi\u00f3n de prueba de PowerPDF, de 15 d\u00edas:<\/span><\/p>\n<\/div>\n<div id=\"block-a3b499f5b3cd4305877230d50bc00142\" class=\"notion-text\">\n<p><a href=\"https:\/\/www.tungstenautomation.com\/products\/power-pdf\/standard-free-trial\" target=\"_blank\" rel=\"noopener\">https:\/\/www.tungstenautomation.com\/products\/power-pdf\/standard-free-trial<\/a><\/p>\n<\/div>\n<div id=\"block-7e488e27915c497681b7bcd868170dd7\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<div id=\"block-c82a01c8abb64e09a7f90e1682e2feaf\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<h1><\/h1>\n<h1 id=\"block-852c7ecc993a40fe917887059ef2503a\" class=\"notion-heading\"><span id=\"852c7ecc993a40fe917887059ef2503a\" class=\"notion-heading__anchor\"><\/span><span class=\"notion-semantic-string\"><span class=\"highlighted-color color-red\"><strong>Servicios online gratis<\/strong><\/span><\/span><\/h1>\n<div id=\"block-49cc2e2f021d4cd5934bc3b7a9c3c093\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><a class=\"notion-link link\" href=\"https:\/\/tools.pdf24.org\/es\/ocr-pdf\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/tools.pdf24.org\/es\/ocr-pdf<\/a><\/span><\/p>\n<\/div>\n<div id=\"block-5d7ef8acc90f45b78285dfcaf2046e23\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Pros<\/strong>: gratis.<\/span><\/p>\n<\/div>\n<div id=\"block-1333cda53b3d4344950874feaf1c8590\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\"><strong>Contras<\/strong>: NO hay control sobre el destino del archivo.<\/span><\/p>\n<\/div>\n<div id=\"block-6c26e3eea6d94a5ba113b9153e3acd6c\" class=\"notion-text\">\n<p class=\"notion-text__content\"><span class=\"notion-semantic-string\">OCR de media o baja calidad. No es recomendable usar servicios en l\u00ednea para documentos legales en resguardo de los juzgados.<\/span><\/p>\n<\/div>\n<div id=\"block-7f7390e599ee4b959660923192f631f8\" class=\"notion-embed\">\n<div class=\"notion-embed__content\">\n<div class=\"notion-embed__container__wrapper\">\n<div class=\"LazyLoad is-visible notion-embed__container\"><iframe title=\"www.youtube.com\" src=\"https:\/\/www.youtube.com\/embed\/DFdwkIeeGdQ?rel=0\" frameborder=\"0\" sandbox=\"allow-scripts allow-popups allow-forms allow-same-origin allow-top-navigation-by-user-activation allow-popups-to-escape-sandbox\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"block-93dd5ea285c744409f62145a6ba1d84c\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<div id=\"block-b177eab0843040b99fe177648c17a74f\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<div id=\"block-6a7aed0be1c34a27aff6cd70585b495b\" class=\"notion-text\">\n<p class=\"notion-text__content\">\n<\/div>\n<\/article>\n<\/div>\n<\/div>\n<div class=\"super-footer stack no-footnote\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Para trabajar con Chatbots de IA y con programas como Zotero,\u00a0 es conveniente que los archivos PDF\u00a0 tengan \u201ccapacidad de b\u00fasqueda\u201d de texto. Es frecuente que los archivos PDF de documentos, libros, escritos, solo contengan im\u00e1genes de los documentos escaneados. Es el&hellip;<span class=\"read-more\"><a href=\"https:\/\/campusvirtual.net\/escritos\/2023\/05\/20\/ocr-archivos-pdf-con-capacidad-de-busqueda\/\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28,32],"tags":[],"class_list":["post-355","post","type-post","status-publish","format-standard","hentry","category-ia-y-justicia","category-zotero"],"_links":{"self":[{"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/posts\/355","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/comments?post=355"}],"version-history":[{"count":6,"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/posts\/355\/revisions"}],"predecessor-version":[{"id":541,"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/posts\/355\/revisions\/541"}],"wp:attachment":[{"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/media?parent=355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/categories?post=355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/campusvirtual.net\/escritos\/wp-json\/wp\/v2\/tags?post=355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}