{"id":6433,"date":"2024-07-28T18:25:44","date_gmt":"2024-07-28T15:25:44","guid":{"rendered":"https:\/\/1.cbm.ua\/?p=6433"},"modified":"2024-08-27T08:12:34","modified_gmt":"2024-08-27T05:12:34","slug":"%d1%83%d1%81%d1%82%d0%b0%d0%bd%d0%be%d0%b2%d0%ba%d0%b0-%d0%b1%d0%b8%d0%b1%d0%bb%d0%b8%d0%be%d1%82%d0%b5%d0%ba%d1%83-spacy","status":"publish","type":"post","link":"https:\/\/1.cbm.ua\/?p=6433","title":{"rendered":"\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 &#171;SpaCy&#187;.  \u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0438 \u0430\u043d\u0430\u043b\u0438\u0437 \u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e Python-\u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 spaCy"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">&#171;<strong>SpaCy<\/strong>&#187; \u2014 \u044d\u0442\u043e \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u0430\u044f \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430 \u0441 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u043c \u0438\u0441\u0445\u043e\u0434\u043d\u044b\u043c \u043a\u043e\u0434\u043e\u043c \u0434\u043b\u044f \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430, \u043d\u0430\u043f\u0438\u0441\u0430\u043d\u043d\u0430\u044f \u043d\u0430 \u044f\u0437\u044b\u043a\u0430\u0445 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f Python \u0438 Cython.<\/h2>\n\n\n\n<p><\/p>\n\n\n\n<p>\u0427\u0438\u0442\u0430\u043b\u0430 \u0432 \u043a\u043d\u0438\u0433\u0438\u0433e &#171;<strong>\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430_\u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e_\u044f\u0437\u044b\u043a\u0430_Python_\u0438_spaCy_\u043d\u0430_\u043f\u0440\u0430\u043a\u0442\u0438\u043a\u0435_2021_\u0412\u0430\u0441\u0438\u043b\u044c\u0435\u0432.pdf<\/strong>&#187; \u043a\u0430\u043a \u0441\u0434\u0435\u043b\u0430\u0442\u044c \u0431\u043e\u0442. \u0423\u0437\u043d\u0430\u043b \u043f\u0440\u043e \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 &#171;<strong>SpaCy<\/strong>&#171;.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>\u041d\u0430\u0448\u0435\u043b \u0441\u0442\u0430\u0442\u044c\u044e &#171;\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0438 \u0430\u043d\u0430\u043b\u0438\u0437 \u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e <strong>Python<\/strong>-\u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 <strong>spaCy<\/strong>&#171;<br><\/p>\n\n\n\n<p><a href=\"https:\/\/habr.com\/ru\/companies\/otus\/articles\/755584\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/habr.com\/ru\/companies\/otus\/articles\/755584\/<\/a><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>\u0438 \u043f\u0440\u043e\u0431\u0443\u044e \u0440\u0430\u0437\u043e\u0431\u0440\u0430\u0442\u044c\u0441\u044f \u0441 \u044d\u0442\u043e\u0439 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u043e\u0439 \u0438 \u043a\u0430\u043a \u0435\u0435 \u044f \u043c\u043e\u0433\u0443 \u043f\u0440\u0438\u043c\u0435\u043d\u044f\u0442\u044c \u0434\u043b\u044f \u0438\u0437\u0443\u0447\u0435\u043d\u0438\u044f \u041d\u0435\u043c\u0435\u0446\u043a\u043e\u0433\u043e \u0438 \u0410\u043d\u0433\u043b\u0438\u0439\u0441\u043a\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u043e\u0432<br>\u0438 \u043f\u0440\u043e\u0435\u043a\u0442\u0430 &#171;<strong>pr. \u041f\u041e \u043f\u043e\u043c\u043e\u0433\u0430\u0435\u0442 \u043e\u0431\u0449\u0430\u0442\u044c\u0441\u044f \u0438 \u0438\u0437\u0443\u0447\u0430\u0442\u044c \u044f\u0437\u044b\u043a<\/strong>&#171;<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>\u0412\u041d\u0418\u041c\u0410\u041d\u0418\u0415<\/strong> <strong>!!!<\/strong> <br>\u0415\u0441\u043b\u0438 \u0432\u0435\u0440\u0441\u0438\u044f \u043d\u0430 \u043a\u043e\u043c\u043f\u044c\u044e\u0442\u0435\u0440\u0435 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u0442\u043e \u043d\u0443\u0436\u043d\u043e \u0432\u044b\u044f\u0441\u043d\u0438\u0442\u044c \u0432 \u043a\u0430\u043a\u0443\u044e \u0432\u0435\u0440\u0441\u0438\u044e \u0431\u0443\u0434\u0443\u0442 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0442\u044c\u0441\u044f. \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438.<\/p>\n\n\n\n<p>\u0432 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u0435<br>import sys<br>print(sys.version)<\/p>\n\n\n\n<p>\u0432 cbm<br>py &#8212;version<br>python &#8212;version<br>python3 &#8212;version<br>python3.12 &#8212;version<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u041e\u0431\u043d\u043e\u0432\u0438\u0442\u044c \u0432\u0435\u0440\u0441\u0438\u044e <strong>pip<\/strong><br>py -m <strong>pip <\/strong>install -U <strong>pip<\/strong><\/p>\n\n\n\n<p>\u043a\u0430\u043a \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u044c \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 &#171;<strong>SpaCy<\/strong>&#187; ?<\/p>\n\n\n\n<p>\u0427\u0442\u043e\u0431\u044b \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u044c \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 <strong>SpaCy<\/strong>, \u0432\u044b\u043f\u043e\u043b\u043d\u0438\u0442\u0435 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u0448\u0430\u0433\u0438:<\/p>\n\n\n\n<p>\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u0435 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u0438: \u044d\u0442\u0438 \u0434\u0432\u0435 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 \u043d\u0443\u0436\u043d\u044b \u0434\u043b\u044f \u0440\u0430\u0431\u043e\u0442\u044b \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 &#171;<strong>SpaCy<\/strong>&#187; !<br><strong>pip<\/strong> install <strong>numpy cython<\/strong><br>\u0438\u043b\u0438 \u0430\u043f\u0433\u0440\u0435\u0439\u0434 \u0435\u0441\u043b\u0438 &#171;<strong>SpaCy<\/strong>&#187; \u0443\u0436\u0435 \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u0430.<br>py -m pip install -U numpy cython<br>py -m pip install &#8212;upgrade numpy cython<\/p>\n\n\n\n<p>\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u0435 <strong>SpaCy<\/strong>:<br>pip install spacy<br>\u0438\u043b\u0438 \u0430\u043f\u0433\u0440\u0435\u0439\u0434 \u0435\u0441\u043b\u0438 &#171;<strong>SpaCy<\/strong>&#187; \u0443\u0436\u0435 \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u0430.<br>py -m pip install -U spacy<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">py -m pip install --upgrade spacy<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>\u0417\u0430\u0433\u0440\u0443\u0437\u0438\u0442\u0435 \u044f\u0437\u044b\u043a\u043e\u0432\u0443\u044e \u043c\u043e\u0434\u0435\u043b\u044c (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0434\u043b\u044f \u0440\u0443\u0441\u0441\u043a\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430):<\/p>\n\n\n\n<p>\u0420\u0443\u0441\u0441\u043a\u043e\u0433\u043e<br>\u0414\u043b\u044f \u043c\u0430\u043b\u0435\u043d\u044c\u043a\u043e\u0439 \u043c\u043e\u0434\u0435\u043b\u0438:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download ru_core_news_sm<\/pre>\n\n\n\n<p>\u0414\u043b\u044f \u0441\u0440\u0435\u0434\u043d\u0435\u0439 \u043c\u043e\u0434\u0435\u043b\u0438:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download ru_core_news_md<\/pre>\n\n\n\n<p>\u0414\u043b\u044f \u0431\u043e\u043b\u044c\u0448\u043e\u0439 \u043c\u043e\u0434\u0435\u043b\u0438:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download ru_core_news_lg<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u0410\u043d\u0433\u043b\u0438\u0439\u0441\u043a\u0438\u0439:<br>\u041c\u0430\u043b\u0435\u043d\u044c\u043a\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download en_core_web_sm<\/pre>\n\n\n\n<p>\u0421\u0440\u0435\u0434\u043d\u044f\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download en_core_web_md<\/pre>\n\n\n\n<p>\u0411\u043e\u043b\u044c\u0448\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download en_core_web_lg<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u041d\u0435\u043c\u0435\u0446\u043a\u0438\u0439:<br>\u041c\u0430\u043b\u0435\u043d\u044c\u043a\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download de_core_news_sm<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>\u0421\u0440\u0435\u0434\u043d\u044f\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download de_core_news_md<\/pre>\n\n\n\n<p>\u0411\u043e\u043b\u044c\u0448\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python -m spacy download de_core_news_lg<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u0423\u043a\u0440\u0430\u0438\u043d\u0441\u043a\u0438\u0439:<br>\u041c\u0430\u043b\u0435\u043d\u044c\u043a\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python3 -m spacy download uk_core_news_sm<\/pre>\n\n\n\n<p><br><\/p>\n\n\n\n<p>\u0421\u0440\u0435\u0434\u043d\u044f\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python3 -m spacy download uk_core_news_md<\/pre>\n\n\n\n<p><br><\/p>\n\n\n\n<p>\u0411\u043e\u043b\u044c\u0448\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python3 -m spacy download uk_core_news_lg<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>\u041a\u043e\u043c\u0430\u043d\u0434\u0430 <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>python -m spacy validate<\/strong><\/code><\/pre>\n\n\n\n<p>\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u0438\u043c\u043e\u0441\u0442\u0438 \u0432\u0441\u0435\u0445 \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u043d\u044b\u0445 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 \u0438 \u043f\u0430\u043a\u0435\u0442\u043e\u0432 spaCy \u0441 \u0442\u0435\u043a\u0443\u0449\u0435\u0439 \u0432\u0435\u0440\u0441\u0438\u0435\u0439 <strong>spaCy<\/strong>, \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u043d\u043e\u0439 \u0432 \u0432\u0430\u0448\u0435\u0439 \u0441\u0440\u0435\u0434\u0435<br>  \u042d\u0442\u043e \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e \u043f\u043e\u043b\u0435\u0437\u043d\u043e \u043f\u043e\u0441\u043b\u0435 \u043e\u0431\u043d\u043e\u0432\u043b\u0435\u043d\u0438\u044f <strong>spaCy<\/strong>, \u0447\u0442\u043e\u0431\u044b \u0443\u0431\u0435\u0434\u0438\u0442\u044c\u0441\u044f, \u0447\u0442\u043e \u0432\u0441\u0435 \u043c\u043e\u0434\u0435\u043b\u0438 \u0438 \u043f\u0430\u043a\u0435\u0442\u044b \u0440\u0430\u0431\u043e\u0442\u0430\u044e\u0442 \u043a\u043e\u0440\u0440\u0435\u043a\u0442\u043d\u043e.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"719\" height=\"271\" src=\"https:\/\/fjngqp1mvftjzxfzrdiggafze9wxueam.cdn-freehost.com.ua\/wp-content\/uploads\/2024\/07\/image-1.png\" alt=\"\" class=\"wp-image-6443\" srcset=\"https:\/\/fjngqp1mvftjzxfzrdiggafze9wxueam.cdn-freehost.com.ua\/wp-content\/uploads\/2024\/07\/image-1.png 719w, https:\/\/fjngqp1mvftjzxfzrdiggafze9wxueam.cdn-freehost.com.ua\/wp-content\/uploads\/2024\/07\/image-1-300x113.png 300w\" sizes=\"auto, (max-width: 719px) 100vw, 719px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u041c\u0430\u043b\u0435\u043d\u044c\u043a\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c (sm):<br>\u0420\u0430\u0437\u043c\u0435\u0440: \u043e\u043a\u043e\u043b\u043e 14 \u041c\u0411<\/p>\n\n\n\n<p>\u0421\u0440\u0435\u0434\u043d\u044f\u044f \u043c\u043e\u0434\u0435\u043b\u044c (md):<br>\u0420\u0430\u0437\u043c\u0435\u0440: \u043e\u043a\u043e\u043b\u043e 39 \u041c\u0411<\/p>\n\n\n\n<p>\u0411\u043e\u043b\u044c\u0448\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c (lg):<br>\u0420\u0430\u0437\u043c\u0435\u0440: \u043e\u043a\u043e\u043b\u043e 489 \u041c\u0411<\/p>\n\n\n\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">\u041f\u0440\u0438\u0432\u0435\u0434\u0435\u043d\u0438\u0435 \u0441\u043b\u043e\u0432 \u043a \u0438\u0445 \u0431\u0430\u0437\u043e\u0432\u043e\u0439 \u0444\u043e\u0440\u043c\u0435<\/h4>\n\n\n\n<p>\u041b\u0435\u043c\u043c\u0430\u0442\u0438\u0437\u0430\u0446\u0438\u044f &#8212; \u044d\u0442\u043e \u043f\u0440\u043e\u0446\u0435\u0441\u0441 \u043f\u0440\u0438\u0432\u0435\u0434\u0435\u043d\u0438\u044f \u0441\u043b\u043e\u0432\u0430 \u043a \u0435\u0433\u043e \u0431\u0430\u0437\u043e\u0432\u043e\u0439 \u0444\u043e\u0440\u043c\u0435 (\u043b\u0435\u043c\u043c\u0435) \u043f\u0443\u0442\u0435\u043c \u0443\u0434\u0430\u043b\u0435\u043d\u0438\u044f \u043e\u043a\u043e\u043d\u0447\u0430\u043d\u0438\u0439 \u0438 \u0441\u0443\u0444\u0444\u0438\u043a\u0441\u043e\u0432. \u042d\u0442\u043e \u043f\u043e\u043c\u043e\u0433\u0430\u0435\u0442 \u0443\u043d\u0438\u0444\u0438\u0446\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0444\u043e\u0440\u043c\u044b \u0441\u043b\u043e\u0432\u0430 \u0438 \u0443\u043b\u0443\u0447\u0448\u0438\u0442\u044c \u0442\u043e\u0447\u043d\u043e\u0441\u0442\u044c \u0430\u043d\u0430\u043b\u0438\u0437\u0430.<\/p>\n\n\n\n<p>\u041f\u0440\u0438\u043c\u0435\u0440:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">#de_core_news_sm\nb = \"de_core_news_sm\"\nimport spacy\n\nnlp = spacy.load(b)\n\ntext = 'Der Plan der Allianz zur \u00dcbernahme des singapurischen Versicherers Income Insurance f\u00fcr 1,5 Milliarden Euro st\u00f6\u00dft in dem s\u00fcdostasiatischen Finanzzentrum auf massive Kritik. Das berichtet das \"Handelsblatt\".'\n\ndoc = nlp(text)\n\nfor token in doc:\n    if (token.text != token.lemma_):\n        print(f\"{token.text} &lt;> {token.lemma_};\")\n    else:\n        print(f\"= {token.text}\")<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>\u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Der &lt;&gt; der;\n= Plan\n= der\n= Allianz\nzur &lt;&gt; zu;\n= \u00dcbernahme\ndes &lt;&gt; der;\nsingapurischen &lt;&gt; singapurisch;\nVersicherers &lt;&gt; Versicherer;\nIncome &lt;&gt; Incom;\n= Insurance\n= f\u00fcr\n= 1,5\nMilliarden &lt;&gt; Milliarde;\n= Euro\nst\u00f6\u00dft &lt;&gt; sto\u00dfen;\n= in\ndem &lt;&gt; der;\ns\u00fcdostasiatischen &lt;&gt; s\u00fcdostasiatisch;\n= Finanzzentrum\n= auf\nmassive &lt;&gt; massiv;\n= Kritik\n. &lt;&gt; --;\nDas &lt;&gt; der;\nberichtet &lt;&gt; berichten;\ndas &lt;&gt; der;\n\" &lt;&gt; --;\n= Handelsblatt\n\" &lt;&gt; --;\n. &lt;&gt; --;<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u041f\u0440\u0438\u043c\u0435\u0440 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u044b \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0432\u044b\u0434\u0430\u0435\u0442 \u043f\u043e\u043b\u0435\u0437\u043d\u044b\u0439 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442: \u0424\u0430\u0439\u043b &#171;test_v1.py&#187;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\n#de_core_news_sm\nb = \"de_core_news_sm\"\n\ntext = 'Der Plan der Allianz zur \u00dcbernahme des singapurischen Versicherers Income Insurance f\u00fcr 1,5 Milliarden Euro st\u00f6\u00dft in dem s\u00fcdostasiatischen Finanzzentrum auf massive Kritik. Das berichtet das \"Handelsblatt\".'\nfrom lebery_jp import find_value_by_key__token_lemma\n#------------------\n\nimport spacy\n\n\n# \u0417\u0430\u0433\u0440\u0443\u0436\u0430\u0435\u043c \u044f\u0437\u044b\u043a\u043e\u0432\u0443\u044e \u043c\u043e\u0434\u0435\u043b\u044c\nnlp = spacy.load(b)\n\n# \u0412\u0445\u043e\u0434\u043d\u043e\u0439 \u0442\u0435\u043a\u0441\u0442\n#text = \"I like to read books.\"\n\n# \u041f\u0440\u0438\u043c\u0435\u043d\u044f\u0435\u043c \u0430\u043d\u0430\u043b\u0438\u0437\ndoc = nlp(text)\n\n# \u0412\u044b\u0432\u043e\u0434\u0438\u043c \u0441\u043b\u043e\u0432\u0430 \u0438 \u0438\u0445 \u0447\u0430\u0441\u0442\u0438 \u0440\u0435\u0447\u0438\nfor token in doc:\n    t = token.lemma_ if (token.lemma_ != \"--\") else \".,?!\"\n    print(f\"{token.text} ({\"&lt;-\" + t + \"; \"if (token.text != t) else \"= ;\"}-> {find_value_by_key__token_lemma(token.pos_)})\")<\/code><\/pre>\n\n\n\n<p> <\/p>\n\n\n\n<p>\u0424\u0430\u0439\u043b \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430 \u0432 \u0442\u043e\u0439 \u0436\u0435 \u043f\u0430\u043f\u043a\u0435 &#171;lebery_jp.py&#187;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def find_value_by_key__token_lemma(key):\n    \"\"\"\n    \u0424\u0443\u043d\u043a\u0446\u0438\u044f \u0438\u0449\u0435\u0442 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043f\u043e \u043a\u043b\u044e\u0447\u0443 \u0432 \u0437\u0430\u0434\u0430\u043d\u043d\u043e\u043c \u0441\u043b\u043e\u0432\u0430\u0440\u0435.\n    \n    :param data: \u0441\u043b\u043e\u0432\u0430\u0440\u044c, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u043d\u0443\u0436\u043d\u043e \u0438\u0441\u043a\u0430\u0442\u044c\n    :param key: \u043a\u043b\u044e\u0447, \u043f\u043e \u043a\u043e\u0442\u043e\u0440\u043e\u043c\u0443 \u043d\u0443\u0436\u043d\u043e \u043d\u0430\u0439\u0442\u0438 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435\n    :return: \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435, \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0435\u0435 \u043a\u043b\u044e\u0447\u0443, \u0438\u043b\u0438 None, \u0435\u0441\u043b\u0438 \u043a\u043b\u044e\u0447 \u043d\u0435 \u043d\u0430\u0439\u0434\u0435\u043d\n    \"\"\"\n\n    data = {\n    \"DET\": \"DET: Artikel oder Determinator (\u0430\u0440\u0442\u0438\u043a\u043b\u044c \u0438\u043b\u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044c)\", \n    \"NOUN\": \"NOUN: Substantiv (\u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435)\", \n    \"ADP\": \"ADP: Pr\u00e4position (\u043f\u0440\u0435\u0434\u043b\u043e\u0433)\", \n    \"ADJ\": \"ADJ: Adjektiv (\u043f\u0440\u0438\u043b\u0430\u0433\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0435)\", \n    \"PROPN\": \"PROPN: Pronomen Substantiv (\u0438\u043c\u044f \u0441\u043e\u0431\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0435)\", \n    \"NUM\": \"NUM: Zahlwort (\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435)\", \n    \"VERB\": \"VERB: Verb (\u0433\u043b\u0430\u0433\u043e\u043b)\", \n    \"PUNCT\": \"PUNCT: Satzzeichen (\u0437\u043d\u0430\u043a \u043f\u0440\u0435\u043f\u0438\u043d\u0430\u043d\u0438\u044f)\", \n    \"PRON\": \"PRON: Pronomen (\u043c\u0435\u0441\u0442\u043e\u0438\u043c\u0435\u043d\u0438\u0435)\"\n    }\n\n    return data.get(key, \"\u041d\u0435 \u043d\u0430\u0439\u0436\u0435\u043d\u043e: \" + key)<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#171;SpaCy&#187; \u2014 \u044d\u0442\u043e \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u0430\u044f \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430 \u0441 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u043c \u0438\u0441\u0445\u043e\u0434\u043d\u044b\u043c \u043a\u043e\u0434\u043e\u043c \u0434\u043b\u044f \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430, \u043d\u0430\u043f\u0438\u0441\u0430\u043d\u043d\u0430\u044f \u043d\u0430 \u044f\u0437\u044b\u043a\u0430\u0445 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f Python \u0438 Cython. \u0427\u0438\u0442\u0430\u043b\u0430 \u0432 \u043a\u043d\u0438\u0433\u0438\u0433e &#171;\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430_\u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e_\u044f\u0437\u044b\u043a\u0430_Python_\u0438_spaCy_\u043d\u0430_\u043f\u0440\u0430\u043a\u0442\u0438\u043a\u0435_2021_\u0412\u0430\u0441\u0438\u043b\u044c\u0435\u0432.pdf&#187; \u043a\u0430\u043a \u0441\u0434\u0435\u043b\u0430\u0442\u044c \u0431\u043e\u0442. \u0423\u0437\u043d\u0430\u043b \u043f\u0440\u043e \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 &#171;SpaCy&#171;. \u041d\u0430\u0448\u0435\u043b \u0441\u0442\u0430\u0442\u044c\u044e &#171;\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0438 \u0430\u043d\u0430\u043b\u0438\u0437 \u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e Python-\u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 spaCy&#171; https:\/\/habr.com\/ru\/companies\/otus\/articles\/755584\/ \u0438 \u043f\u0440\u043e\u0431\u0443\u044e \u0440\u0430\u0437\u043e\u0431\u0440\u0430\u0442\u044c\u0441\u044f \u0441 \u044d\u0442\u043e\u0439 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u043e\u0439 \u0438 \u043a\u0430\u043a \u0435\u0435 \u044f \u043c\u043e\u0433\u0443&hellip;&nbsp;<a href=\"https:\/\/1.cbm.ua\/?p=6433\" rel=\"bookmark\">\u041f\u043e\u0434\u0440\u043e\u0431\u043d\u0435\u0435 &raquo;<span class=\"screen-reader-text\">\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 &#171;SpaCy&#187;.  \u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0438 \u0430\u043d\u0430\u043b\u0438\u0437 \u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u044f\u0437\u044b\u043a\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e Python-\u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 spaCy<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-6433","post","type-post","status-publish","format-standard","hentry","category-1"],"_links":{"self":[{"href":"https:\/\/1.cbm.ua\/index.php?rest_route=\/wp\/v2\/posts\/6433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/1.cbm.ua\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/1.cbm.ua\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/1.cbm.ua\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/1.cbm.ua\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6433"}],"version-history":[{"count":13,"href":"https:\/\/1.cbm.ua\/index.php?rest_route=\/wp\/v2\/posts\/6433\/revisions"}],"predecessor-version":[{"id":6679,"href":"https:\/\/1.cbm.ua\/index.php?rest_route=\/wp\/v2\/posts\/6433\/revisions\/6679"}],"wp:attachment":[{"href":"https:\/\/1.cbm.ua\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/1.cbm.ua\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6433"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/1.cbm.ua\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}