Web organization, organización de la web. Francisco Javier Cervigon Ruckauer

Web organization, organización de la web

  • Describe how operators filter results.
  • Use the site: operator at the top-level domain and website levels.
  • Use a word you expect to appear on the target page to refine results.
  • Use the site: operator within images and news results.
-------------------------------------------
  • Describir cómo los operadores resultados del filtro.
  • Utilice el operador site: en los niveles de dominio y sitios Web de nivel superior.
  • Utilizar una palabra que espera que aparezca en la página de destino para refinar los resultados.
  • Utilice el operador site: dentro de las imágenes y los resultados de noticias.


Versión en texto


Lesson 3.1: Web organization
Contents:
  1. Site: search
  2. Using site: search for top-level domains
  3. Tips for using site: correctly
  4. Using site: with top level domains
  5. Using site: with Image and News search
Welcome to Class 3, which will focus on operators.
What is an operator? An operator is a command that you add to your query to give Google special instructions about how you want it to deal with a specific search term. Some operators are symbols, while others are words. What symbols or words act as operators is determined by Google, rather than the searcher. Today you will learn about several operators, and gain a better sense of what operators are.
For example, if you do a query for [tesla coil], which you saw earlier, you get results about Tesla coils, as well as Nikola Tesla and other associated ideas.
Figure: Search results for [tesla coil].
But suppose you want to limit results to pages that come from the Stanford University website (stanford.edu). You add an operator. In this case, the site: operator: [tesla coil site:stanford.edu]. It limits the results to pages with the words tesla and coil, but justfrom the specified website-- stanford.edu.
Figure: Notice that every web address is from Stanford.edu is these results for the query [tesla coil site:stanford.edu].
Here is the key idea to remember about the way operators work: you can imagine this sort of giant blob of all the results for the search [tesla coil]. When you add in an operator like [site:stanford.edu], it is giving you a subset of those results. What it effectively is doing is saying, "Here's the entire space of all possible results, now, just give me the ones from just this site."
Figure: Venn Diagram showing that the results for [tesla coil site:stanford.edu] are a subset of the results for [tesla coil].
In addition to using the site: operator to narrow to a single site, you can also use it to limit to all the sites within a top-level domain:
Country codes top-level domains are a good way to restrict what you are looking for, such as using .in (India), .br (Brazil), .es (Spain), or even .aq (Antartica).
Be aware, too, of how top level domains operate. For example:
  1. .edu -- educational institutions, but only in the US (for example, use ac.uk for educational institutions in the United Kingdom)
  2. .mil -- military servers, but only in the US
  3. .gov -- government websites in the US
  4. .go.ke -- government websites in some countries take on this structure. Other variations include gov.by, gob.gt, etc.  
  5. .co.uk -- equivalent to .com, in some countries’ domains (e.g., .co.nz, .co.il, or co.sa)
If you need to find out the domain for a particular country, you can do a search like [government bosnia] and look at what domains come up in results.
You can use site: with parts of domains, such as ac.uk for academic sites within the United Kingdom or go.ke, for government sites in Kenya.
Here are some tips for writing site: searches that work:
1. Spacing
Site: operators do not work if there is a space after the colon:
  1. Effective:        [site:stanford.edu]
  2. Ineffective:        [site: stanford.edu]
2. Query order
The site: portion of the query can come either before or after the other search terms:
  1. Effective:        [baby safe sunscreen site:nih.gov]
  2. Effective:        [site:nih.gov baby safe sunscreen]
3. Including the period
  1. Top-level domains work with or without the period:
  1. Effective:        [site:com]  
  2. Effective:         [site:.com]
  1. Portions of a domain require the period between the two elements of the domain name:
  1. Effective:        [site:gc.ca]
  2. Effective:        [site:.gc.ca]
  1. A website must include the period between the site name and the top level domain:
  1. Effective:        [site:stanford.edu]
  2. Ineffective:        [site:stanfordedu]
  3. Ineffective:        [site:stanford]
Now, take a look at what a search looks like with just the top-level domain: [business workplace accident rates site:gov].
Figure: Results for the query [business workplace accident rates site:gov].
Notice that all the results are coming from .gov sites, which denote government pages here in the United States.
Government websites may contain databases that house rich data. While Google can point you to the front pages of these databases, the databases themselves are often created in a manner that does not allow Google to crawl the data directly. So, you can use Google to find the homepage of the database, and you can go in and search the contents yourself.
For example, OSHA has a ton of data available to you, and you can find the databases that contain it through a site restriction mechanism like this, but you have to go to the front page of the database to actually search its contents.
If you want to learn more about this concept of the deep web, check out the bonus videoin the forum.
You can also explore these features by selecting a simple starter search, like [mariculture], or fish farming.
Start with a search like [mariculture site:com]:
 Figure: Results for the query [mariculture site:com].
You can then use this search in other types of media on Google. For example, in Image search:
Figure: Results for the image search mariculture site:com.
These images are all from .com sites. You can also change the query to find images from [site:gov].
Figure: Results from the query  [mariculture site:gov].
Now, you are seeing only results from governmental sites in the US that have images involving mariculture.
The same trick works with other media, like News:
Figure: Results from News search for [business workplace site:gov].
You can confirm that these results are from News because News is highlighted in red in the column on the left side of the screen.
So, in this lesson you have explored how the site: operator can be used to restrict results to either a specific top-level domain or a specific website.
As you will be seeing in the next few classes, there are many different kinds of operators. Today you will learn about the top four or five that people use a lot. You can explore even more operators here and here, but these are the ones that we use day-to-day and actually give you a lot more power.

Go ahead and try the site: searching activity.

Versión en texto


Lección 3.1: Web de la organización
Contenido:
  1. Búsqueda de sitio
  2. Utilizando el sitio: búsqueda de dominios de nivel superior
  3. Consejos para usar correctamente el sitio:
  4. Utilizando el sitio: con dominios de nivel superior
  5. Utilizando el sitio: con la imagen y de la búsqueda Noticias
Bienvenido a la clase 3, que se centrará en los operadores.
¿Qué es un operador? Un operador es un comando que se agrega a la consulta para dar instrucciones especiales de Google acerca de cómo desea que lidiar con un término de búsqueda específico. Algunos operadores son símbolos, mientras que otros son palabras. ¿Qué símbolos o palabras actúan como operadores se determina por Google, en vez de el buscador. Hoy vas a aprender acerca de varios operadores, y tener una mejor idea de lo que son los operadores.
Por ejemplo, si usted hace una consulta para [ bobina de tesla ], que ya hemos visto anteriormente, se obtienen resultados sobre bobinas de Tesla, así como Nikola Tesla y otras ideas asociadas.
Resultados de la búsqueda para [: Figura bobina de Tesla ].
Pero supongamos que desea limitar los resultados a las páginas que vienen desde el sitio web de la Universidad de Stanford (stanford.edu). Se agrega un operador. En este caso, el operador site:: [ tesla sitio de la bobina: stanford.edu ]. Limita los resultados a páginas con las palabras de Tesla  y la bobina , pero sólo desde la especificada website-- stanford.edu.
Figura: Observe que cada dirección de la tela es de Stanford.edu es estos resultados para el [consulta el sitio bobina de tesla: stanford.edu ].
Esta es la idea clave que hay que recordar acerca de la obra operadores manera: se puede imaginar este tipo de burbuja gigante de todos los resultados de la búsqueda [] bobina de tesla. Cuando se agrega en un operador como [site: stanford.edu], se le está dando un subconjunto de esos resultados. Lo que efectivamente está haciendo es diciendo: "Aquí está todo el espacio de todos los resultados posibles, ahora, sólo dame los justo desde este sitio."
Figura: Diagrama de Venn que muestra que los resultados para [ sitio bobina de tesla: stanford.edu ] son un subconjunto de los resultados para [ bobina de tesla ].
Además de utilizar el operador site: se estreche a un solo sitio, también puede utilizarlo para limitar a todos los sitios dentro de un dominio de nivel superior:
Los códigos de país de dominios de primer nivel son una buena manera de restringir lo que busca, como el uso de .in (India), .br (Brasil), .es (España), o incluso .aq (Antártida).
Tenga en cuenta, también, de cómo operan los dominios de nivel superior. Por ejemplo:
  1. .edu - instituciones educativas, pero sólo en los EE.UU. (por ejemplo, el uso ac.uk para las instituciones educativas en el Reino Unido)
  2. .mil - servidores militares, pero sólo en los EE.UU.
  3. .gov - sitios web del gobierno de los EE.UU.
  4. .go.ke - sitios web del gobierno en algunos países se toman en esta estructura.Otras variaciones incluyen gov.by, gob.gt, etc.  
  5. .es - equivalente a .com, en los dominios de algunos países (por ejemplo, .co.nz, .co.il, o co.sa)
Si necesita averiguar el dominio para un país en particular, se puede hacer una búsqueda como [ Bosnia gobierno ] y ver qué dominios surgido en los resultados.
Usted puede usar el sitio: con partes de dominios, tales como ac.uk para los sitios académicos en el Reino Unido o go.ke, para los sitios del gobierno de Kenia.
Estos son algunos consejos para el sitio de escritura: las búsquedas que trabajan:
1. Separación
Sitio: Los operadores lo hacen no  funciona si hay un espacio después de los dos puntos:
  1. Eficaz : [site:stanford.edu]
  2. Ineficaz : [site: stanford.edu]
2. Para consultas
El sitio: parte de la consulta puede venir antes o después de los otros términos de búsqueda:
  1. Eficaz : [bebé seguro sitio de protección solar: nih.gov]
  2. Eficaz : [site: protector solar nih.gov bebé seguro]
3. Incluyendo el período
  1. dominios de nivel superior funcionan con o sin el período de:
  1. Eficaz : [site: com]  
  2. Eficaz : [site: .com]
  1. Porciones de un dominio requieren el periodo comprendido entre los dos elementos del nombre de dominio:
  1. Eficaz : [site: gc.ca]
  2. Eficaz : [site: .gc.ca]
  1. Un sitio web debe incluir el período entre el nombre del sitio y el dominio de nivel superior:
  1. Eficaz : [site: stanford.edu]
  2. Ineficaz : [site: stanfordedu]
  3. Ineficaz : [site: Stanford]
Ahora, echar un vistazo a lo que una búsqueda parece que con sólo el dominio de nivel superior: [ negocio de las tasas de accidentes de trabajo site: gov ].
Figura: Resultados para la consulta [ negocio de las tasas de accidentes de trabajo site: gov ].
Observe que todos los resultados son procedentes de sitios .gov, que denotan páginas del gobierno aquí en los Estados Unidos.
sitios web del gobierno pueden contener bases de datos que albergan los datos ricos.Mientras que Google puede apuntar a las primeras páginas de estas bases de datos, las bases de datos se crean a sí mismos a menudo de una manera que no permite que Google rastree los datos directamente. Por lo tanto, puede utilizar Google para encontrar la página principal de la base de datos, y se puede entrar y buscar en el contenido mismo.
Por ejemplo, la OSHA tiene un montón de datos disponibles para usted, y usted puede encontrar las bases de datos que contienen throug h un mecanismo de restricción sitio como este, pero hay que ir a la primera página de la base de datos para buscar en realidad su contenido.
Si desea obtener más información acerca de este concepto de la web profunda, ver el vídeo de bonificación en el foro .
También puede explorar estas características mediante la selección de una simple búsqueda de arranque, como [ maricultura ], o la piscicultura.
Comenzar con una búsqueda como [ sitio de maricultura: com ]:
 Figura: Resultados para la consulta [ sitio de maricultura: com ].
A continuación, puede utilizar esta búsqueda en otros tipos de medios de comunicación en Google. Por ejemplo, en la búsqueda de imágenes :
Figura: Los resultados para la búsqueda de imágenes web de maricultura: com .
Estas imágenes son todos de sitios .com. También puede cambiar la consulta para encontrar imágenes de [ site: gov ].
Figura: Los resultados de la consulta [ sitio de maricultura: gov ].
Ahora, usted está viendo sólo los resultados de los sitios gubernamentales en los EE.UU. que tienen las imágenes que implican la maricultura.
El mismo truco funciona con otros medios, como Noticias:
Figura: Los resultados de búsqueda de noticias [ sitio de lugar de trabajo de negocios: gov ].
Puede confirmar que estos resultados son de noticias porque Noticias  se resalta en rojo en la columna en la parte izquierda de la pantalla.
Por lo tanto, en esta lección que han explorado cómo el operador site: se puede utilizar para restringir los resultados a un dominio de alto nivel específico o una página web específica.
A medida que va a ver en las próximas clases, hay muchos tipos diferentes de operadores. Hoy usted aprenderá acerca de los cuatro primeros o cinco personas que utilizan una gran cantidad. Puede explorar aún más operadores aquí  y aquí , pero estos son los que usamos día a día y en realidad le dan mucha más potencia.

Seguir adelante y tratar el sitio: buscar actividad.

---------------------------------------------------------------------------

Advanced Operators for Web Search


Updated: October 21, 2011

Here, in one place, are all of the currently documented advanced search operators for web search.  Note that Scholar, Groups, etc. may have some unique operators listed elsewhere. Also note that some operators come in pairs (e.g., allinanchor along with inanchor: ).  We’ve written about them together rather than having two entries for the same kind of operator.  Also, we followed the square brackets convention where a query is surrounded by square brackets.  When doing the query, you wouldn’t actually use the square brackets in your query.  (Although it won’t hurt anything either...)  
_____________________________________________________________________________

allinanchor:  /  inanchor:

  • Google restricts results to pages containing all query terms in the anchor text on links to the page. For instance: [ allinanchor: best restaurant Sunnyvale ] will return only pages in which the anchor text on links to the pages contain the words “best” “restaurant” and “Sunnyvale” – that is, all of the words following the allinanchor operator.  So, when using allinanchor: in your query, do not include any other search operators.  By contrast, using the operator inanchor:  only searches for the term that’s next.  Example:   [ inanchor:sales offer 2011 ] will search only for “sales” in the anchor text.

  • Anchor text is the text on a page that is linked to another web page or a different place on the current page. When you click on anchor text, you will be taken to the page or place on the page to which it is linked.

allintext: / intext:

  • Restricts results to those containing all the query terms you specify in the text of the page. For example, [ allintext: camping tent stove] will return only pages in which the words “camping” “tent” and “stove” appear in the text of the page.   Using the operator intext: will search only for the next term in the text of the page.   .  (Note: using intext: in front of every word in your query is the same as using allintext: at the front of your query, e.g., [ intext:Victorian intext:artists ] is the same as [ allintext: Victorian artists ].)

allintitle: / intitle:

  • Restricts results to those containing all the query terms you specify in the title. For example, [ allintitle: university relations ] will return only documents that contain the words “university” and “relations” in the title of the page.  Using the operator intitle: will search only for the next term in the title of the page.  For instance, [ flu shot intitle:help ] will return documents that mention the word “help” in their titles, and mention the words “flu” and “shot” anywhere in the document (title or not).
allinurl: / inurl:

  • Restricts results to those containing all the query terms you specify in the URL. For example, [ allinurl: google faq ] will return only documents that contain the words “google” and “faq” in the URL, such as “www.google.com/help/faq.html”.
term1  AROUNDn ) term2

  • Limits results to those documents where term1 appears within a certain number of words of term2.  For instance, [ search AROUND (3) engine ] will find only documents that have the words “search” within 3 words of “engine” – this is particularly useful when searching for common words that are relevant to your search only when in close proximity.
define:

  • Gives definitions from pages on the web for the term that follows. Useful  for finding definitions of words, phrases, and acronyms. For example, [ define: peruse ] will  give a definition of the word “peruse.”  This also works for many phrases, [ define:Hobson’s choice ]
filetype:suffix

  • Limits results to pages whose names end in suffix.  The suffix is anything following the last period in the file name of the web page and can be many characters in length.

  • Example:  [ search engine guidelines filetype:pdf ] will return Adobe Acrobat pdf files that match the terms “search,” “engine,” “guildelines,” and are  pages whose names end with pdf
Fill in the blanks (*)

  • The *, or wildcard, is a little-known feature that can be very powerful. If you include * within a query, it tells Google to try to treat the star as a placeholder for any unknown term(s) and then find the best matches. For example, the search [ Google * ] will give you results about many of Google's products (go to next page and next page -- we have many products). The query [ Obama voted * on the * bill ] will give you stories about different votes on different bills. Note that the * operator works only on whole words, not parts of words.
inanchor:   (see allinanchor: above )

info:

  • info: will gives some additional  information about the specified web page. For instance, the query --   
            [ info:googleblog.blogspot.com/2011/06/introducing-google-project-real-life.html  ]
          
        will show information about this Google web blog page, including a cached version, links to pages that link to this page, other pages 
        on this site, etc.

intext: (see allintext: above)

intitle: ( see allintitle: above)

inurl: (see allinurl: above)

link:


  • Note that the link:  operator does not return a complete list of all the links available.  It simply returns a representative sample.

Minus sign  (  ) to exclude

  • Placing  a minus sign immediately before a word indicates that you do not want pages that contain this word to appear in your results. The minus sign should appear immediately before the word and should be preceded with a space. For example, in the query [ anti-virus software ], the minus sign is used as a hyphen and will not be interpreted as an exclusion symbol; whereas the query:  [ anti-virus –software ] will search for the words 'anti-virus' but exclude references to software. You can exclude as many words as you want by using the – sign in front of all of them, for example [ jaguar –cars –football –os ]. The – sign can be used to exclude more than just words. For example, place a minus sign before the 'site:' operator (without a space) to exclude a specific site from your search results. (NOTE: If you copy and paste these searches into a search bar, please note that we have elongated the minus sign here so you can see it--please replace with a regular minus sign.)
Number range ( .. )

  • The number range operator searches for results containing numbers in a given range. Just add two numbers, separated by two periods, with no spaces, into the search box along with your search terms. Example: [  Willie Mays 1950..1960 ]   You can also specify a unit of measurement or some other indicator of what the number range represents.  For example, here's how you'd search for a DVD player that costs between $50 and $100: [ DVD player $50..$100 ]
OR

  • The Boolean operator OR specifies alternatives to use as synonyms in search.  For instance, the query:

       [ mesothelioma OR “lung disease”  treatment ]

        could be used to search for a treatment for either mesothelioma or the quoted phrase “lung disease” (Be sure to make the OR all 
        uppercase.  Lowercase or won’t work.)

Phrase search (using double quotes, “…” )

  • By putting double quotes around a set of words, you are telling Google to consider the exact words in that exact order without any change. Google already uses the order and the fact that the words are together as a very strong signal and will stray from it only for a good reason, so quotes are usually unnecessary. By insisting on phrase search you might be missing good results accidentally. For example, a search for [ "Alexander Bell" ] (with quotes) will miss any pages that refer to Alexander G. Bell.
related:

  • A search for  related:URL lists pages that are similar to the web page you specify. For instance, [related:en.wikipedia.org] will list web pages that are similar to the Wikipedia homepage.
Search exactly as is ("word")

  • Google employs synonyms automatically, so that it finds pages that mention, for example, childcare for the query [ child care ] (with a space), or California history for the query [ ca history ]. But sometimes Google helps out a little too much and gives you a synonym when you don't really want it. By enclosing the single word you want to freeze in quotes as in the query [ "ca" history ], you are telling Google to match that word precisely as you typed it. 
site:

  • Using the site: operator restricts your search results to the site or domain you specify. For example, [ penquins site:.aq ] will search for pages about penguins from web sites that have an AQ top-level domain name.  (AQ is Antarctica, and is mostly research stations located there.) A query like  [ accidents site:bls.gov ] will find pages about accidents within the bls.gov domain (BLS = Bureau of Labor Statistics).  You can specify a domain with or without a period, e.g., either as .gov or gov.
Combinations of operators:

  • Many of the search operators –, OR, and " " can be combined.  For example, to find articles on security from all sites except Wikipedia.org you would search for:  

           [  article security –site:Wikipedia.org  ]  

  • Similarly, you might want to exclude some kinds of documents with a search such as [ salsa recipe -tomatoes -filetype:pdf ] which would find salsa recipes that do not include the term “tomatoes” and are not PDF files.  

More advanced search options 

  • Note that the Advanced Search page (http://www.google.com/advanced_search) also provides a set of search options that are not available as special operators.  Using the Advanced Search page you can also:
                   - filter by language (e.g., find pages only in Spanish, Chinese, German, etc.)
                   - date (filter by time)
                   - usage rights (filter by Creative Commons license)
                   - reading level (find pages that are Basic, Intermediate or Advanced reading levels)

-----------------------------------------------

Francisco Javier Cervigon Ruckauer

No hay comentarios:

Publicar un comentario